Skip to content

Tweak TCAV NLP tutorial #1568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions tutorials/TCAV_NLP.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This tutorial shows how to apply TCAV, a concept-based model interpretability algorithm, on sentiment classification task using a ConvNet model (https://captum.ai/tutorials/IMDB_TorchText_Interpret) that was trained using IMDB sensitivity dataset.\n",
"This tutorial shows how to apply TCAV, a concept-based model interpretability algorithm, on a sentiment classification task using a ConvNet model (https://captum.ai/tutorials/IMDB_TorchText_Interpret) that was trained using IMDB sensitivity dataset.\n",
"\n",
"More details about the approach can be found here: https://arxiv.org/pdf/1711.11279.pdf\n",
"More details about the approach can be found here: https://arxiv.org/pdf/1711.11279.pdf.\n",
"\n",
"In order to use TCAV, we need to predefine a list of concepts that we want our predictions to be test against.\n",
"\n",
"Concepts are human-understandable, high-level abstractions such as visually represented \"stripes\" in case of images or \"positive adjective concept\" such as \"amazing, great, etc\" in case of text. Concepts are formatted and represented as input tensors and do not need to be part of the training or test datasets.\n",
"Concepts are human-understandable, high-level abstractions such as visually represented \"stripes\" in case of images or \"positive adjective concept\" such as \"amazing\", \"great\", \"awesome\" in case of text. Concepts are formatted and represented as input tensors and do not need to be part of the training or test datasets.\n",
"\n",
"Concepts are incorporated into the importance score computations using Concept Activation Vectors (CAVs). Traditionally, CAVs train linear classifiers and learn decision boundaries between different concepts using the activations of predefined concepts in a NN layer as inputs to the classifier that we train. The vector that is orthogonal to learnt decision boundary and is pointing towards the direction of a concept is the CAV of that concept.\n",
"Concepts are incorporated into the importance score computations using Concept Activation Vectors (CAVs). Traditionally, CAVs train linear classifiers and learn decision boundaries between different concepts using the activations of predefined concepts in a NN layer as inputs to the classifier that we train. The vector that is orthogonal to the learnt decision boundary and is pointing towards the direction of a concept is the CAV of that concept.\n",
"\n",
"TCAV measures the importance of a concept for a prediction based on the directional sensitivity (derivatives) of a concept in Neural Network (NN) layers. For a given concept and layer it is obtained by aggregating the dot product between CAV for given concept in given layer and the gradients of model predictions w.r.t. given layer output. The aggregation can be performed based on either signs or magnitudes of the directional sensitivities of concepts across multiple examples belonging to a certain class. More details about the technique can be found in above mentioned papers.\n",
"TCAV measures the importance of a concept for a prediction based on the directional sensitivity (derivatives) of a concept in Neural Network (NN) layers. It is obtained by aggregating the dot product between the CAV of a given concept in a given layer, with the gradients of model predictions with respect to the given layer output. The aggregation can be performed based on either signs or magnitudes of the directional sensitivities of concepts across multiple examples belonging to a certain class. More details about the technique can be found in the above referenced paper.\n",
"\n",
"Note: Before running this tutorial, please install the spacy, numpy, scipy, sklearn, PIL, and matplotlib packages.\n",
"\n"
Expand Down Expand Up @@ -50,7 +50,7 @@
"\n",
"from torch.utils.data import DataLoader, Dataset, IterableDataset\n",
"\n",
"#.... Captum imports..................\n",
"# Captum imports\n",
"from captum.concept import TCAV\n",
"from captum.concept import Concept\n",
"from captum.concept._utils.common import concepts_to_str\n",
Expand Down Expand Up @@ -85,7 +85,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining torchtext data `Field` so that we can load the vocabulary for IMDB dataset the way that was done to train IMDB model. "
"Defining torchtext data `Field` so that we can load the vocabulary for the IMDB dataset the way that was done to train the IMDB model. "
]
},
{
Expand All @@ -108,7 +108,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Reading IMDB dataset the same way we did for training sensitivity analysis model. This will help us to load correct token to embedding mapping using Glove."
"Reading the IMDB dataset the same way we did for training the sensitivity analysis model. This will help us to load the correct token for embedding mapping using Glove."
]
},
{
Expand Down Expand Up @@ -228,7 +228,7 @@
"\n",
"The concept definition and the examples describing that concepts are left up to the user.\n",
"\n",
"Below we visualize examples from both `Positive Adjectives` and `Neutral` concepts. This concepts are curated for demonstration purposes. It's up to a user what to include into a concept and what not."
"Below we visualize examples from both `Positive Adjectives` and `Neutral` concepts. These concepts are curated for demonstration purposes. It is up to a user what to include into a concept."
]
},
{
Expand All @@ -255,7 +255,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Both `Positive Adjective` and `Neutral` concepts have the same number of examples representing corresponding concept. The examples for the `Positive Adjective` concept are semi-hand curated and the context is neutralized whereas those for neutral concept are chosen randomly from Gutenberg Poem Dataset (https://github.com/google-research-datasets/poem-sentiment/blob/master/data/train.tsv)\n",
"Both `Positive Adjective` and `Neutral` concepts have the same number of examples representing their corresponding concepts. The examples for the `Positive Adjective` concept are semi-hand curated and the context is neutralized whereas those for neutral concept are chosen randomly from the Gutenberg Poem Dataset (https://github.com/google-research-datasets/poem-sentiment/blob/master/data/train.tsv).\n",
"\n",
"You can consider also using Stanford Sentiment Tree Bank (SST, https://nlp.stanford.edu/sentiment/index.html) dataset with `neutral` labels. "
]
Expand Down Expand Up @@ -309,7 +309,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`Positive Adjectives` concept is surrounded by neutral / uninformative context. It is important to note that we positioned positive adjectives in different locations in the text. This makes concept definitions more abstract and independent of the location. Apart from that as we can see the length of the text in the concepts in fixed to 7. This is because our sensitivity classifier was trained for a fixed sequence length of 7. Besides that this ensures that the activations for concept and prediction examples have the same shape."
"The `Positive Adjective` concept is surrounded by a neutral / uninformative context. It is important to note that we positioned positive adjectives in different locations in the text. This makes concept definitions more abstract and independent of the location. Apart from that as we can see the length of the text in the concepts in fixed to 7. This is because our sensitivity classifier was trained for a fixed sequence length of 7. This ensures that the activations for concept and prediction examples have the same shape."
]
},
{
Expand Down Expand Up @@ -342,14 +342,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Defining and loading pre-trained ConvNet model"
"# Define and load a pre-trained ConvNet model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining the model, so that we can load associated weights into the memory."
"Define the model, so that we can load associated weights into memory."
]
},
{
Expand Down Expand Up @@ -448,16 +448,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Computing TCAV Scores"
"# Compute TCAV Scores"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before computing TCAV scores let's created instances of `positive-adjectives` and `neutral` concepts.\n",
"Before computing TCAV scores let's create instances of `positive-adjectives` and `neutral` concepts.\n",
"\n",
"In order to estimate significant importance of a concept using two-sided hypothesis testing, we define a number of `neutral` concepts. All `neutral` concepts are defined using random samples from Gutenberg Poem Training Dataset (https://github.com/google-research-datasets/poem-sentiment/blob/master/data/train.tsv)."
"In order to estimate significant importance of a concept using two-sided hypothesis testing, we define a number of `neutral` concepts. All `neutral` concepts are defined using random samples from the Gutenberg Poem Training Dataset (https://github.com/google-research-datasets/poem-sentiment/blob/master/data/train.tsv)."
]
},
{
Expand All @@ -481,7 +481,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Below we define five experimental sets consisting of `Positive Adjective` vs `Neutral` concept pairs. TCAV trains a model for each pair, and estimates tcav scores for each experimental set in given input layers. In this case we chose `convs.2` and `convs.1` layers. TCAV score indicates the importance of a concept in a given layer. The higher the TCAV score, the more important is that concept for given layer in making a prediction for a given set of samples."
"Below we define five experimental sets consisting of `Positive Adjective` vs `Neutral` concept pairs. TCAV trains a model for each pair, and estimates TCAV scores for each experimental set in given input layers. In this case we chose `convs.2` and `convs.1` layers. TCAV score indicates the importance of a concept in a given layer. The higher the TCAV score, the more important is that concept for a given layer in making a prediction for a given set of samples."
]
},
{
Expand Down Expand Up @@ -541,7 +541,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we define a number of examples that contain positive sentiment and test the sensitivity of model predictions to `Positive Adjectives` concept."
"Here we define a number of examples that contain positive sentiment and test the sensitivity of model predictions to the `Positive Adjective` concept."
]
},
{
Expand Down Expand Up @@ -606,7 +606,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the cell below we visualize TCAV scores for `Positive Adjective` and `Neutral` concepts in `convs.2` and `convs.1` layers. For this experiment we tested `Positive Adjective` concept vs 5 different `Neutral` concepts. As we can see, `Positive Adjective` concept has consistent high score across all layers and experimental sets."
"In the cell below we visualize TCAV scores for `Positive Adjective` and `Neutral` concepts in `convs.2` and `convs.1` layers. For this experiment we tested `Positive Adjective` concept vs five different `Neutral` concepts. As we can see, `Positive Adjective` concept has consistent high score across all layers and experimental sets."
]
},
{
Expand Down Expand Up @@ -692,7 +692,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to convince ourselves that our concepts truly explain our predictions, we conduct statistical significance tests on TCAV scores by constructing a number of experimental sets. In this case we look into the `Positive Adjective` concept and a number of `Neutral` concepts. If `Positive Adjective` concept is truly important in predicting positive sentiment in the sentence, then we will see consistent high TCAV scores for `Positive Adjective` concept across all experimental sets as apposed to any other concept.\n",
"In order to convince ourselves that our concepts truly explain our predictions, we conduct statistical significance tests on TCAV scores by constructing a number of experimental sets. In this case we look into the `Positive Adjective` concept and a number of `Neutral` concepts. If the `Positive Adjective` concept is truly important in predicting positive sentiment in the sentence, then we will see consistent high TCAV scores for `Positive Adjective` concept across all experimental sets as apposed to any other concept.\n",
"\n",
"Each experimental set contains a random concept consisting of a number of random subsamples. In our case this allows us to estimate the robustness of TCAV scores by the means of numerous random concepts.\n",
"\n"
Expand All @@ -702,7 +702,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition, it is interesting to look into the p-values of statistical significance tests for each concept. We say, that we reject null hypothesis, if the p-value for concept's TCAV scores is smaller than 0.05. This indicates that the concept is important for model prediction.\n",
"In addition, it is interesting to look into the p-values of statistical significance tests for each concept. We say, that we reject the null hypothesis, if the p-value for concept's TCAV scores is smaller than 0.05. This indicates that the concept is important for model prediction.\n",
"\n",
"We label concept populations as overlapping if p-value > 0.05 otherwise disjoint.\n",
"\n"
Expand Down Expand Up @@ -734,7 +734,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can present the distribution of tcav scores using boxplots and the p-values indicating whether TCAV scores of those concepts are overlapping or disjoint."
"We can present the distribution of TCAV scores using box plots and the p-values indicating whether TCAV scores of those concepts are overlapping or disjoint."
]
},
{
Expand Down Expand Up @@ -769,7 +769,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The box plots below visualize the distribution of TCAV scores for a pair of concepts in two different layers, `convs.2` and `convs.1`. Each layer is visualized in a separate jupyter cell. Below diagrams show that `Positive Adjectives` concept has TCAV scores that are consistently high across all layers and experimental sets as apposed to `Neutral` concept. It also shows that `Positive Adjectives` and `Neutral` are disjoint populations.\n"
"The box plots below visualize the distribution of TCAV scores for a pair of concepts in two different layers, `convs.2` and `convs.1`. Each layer is visualized in a separate Jupyter cell. The diagrams below show that the `Positive Adjective` concept has TCAV scores that are consistently high across all layers and experimental sets compared with the `Neutral` concept. It also shows that `Positive Adjective` and `Neutral` are disjoint populations.\n"
]
},
{
Expand Down