27
27
######################################################################
28
28
# Threat Model
29
29
# ------------
30
- #
30
+ #
31
31
# For context, there are many categories of adversarial attacks, each with
32
32
# a different goal and assumption of the attacker’s knowledge. However, in
33
33
# general the overarching goal is to add the least amount of perturbation
45
45
# misclassification* means the adversary wants to alter an image that is
46
46
# originally of a specific source class so that it is classified as a
47
47
# specific target class.
48
- #
48
+ #
49
49
# In this case, the FGSM attack is a *white-box* attack with the goal of
50
50
# *misclassification*. With this background information, we can now
51
51
# discuss the attack in detail.
52
- #
52
+ #
53
53
# Fast Gradient Sign Attack
54
54
# -------------------------
55
- #
55
+ #
56
56
# One of the first and most popular adversarial attacks to date is
57
57
# referred to as the *Fast Gradient Sign Attack (FGSM)* and is described
58
58
# by Goodfellow et. al. in `Explaining and Harnessing Adversarial
64
64
# the loss* based on the same backpropagated gradients. In other words,
65
65
# the attack uses the gradient of the loss w.r.t the input data, then
66
66
# adjusts the input data to maximize the loss.
67
- #
67
+ #
68
68
# Before we jump into the code, let’s look at the famous
69
69
# `FGSM <https://arxiv.org/abs/1412.6572>`__ panda example and extract
70
70
# some notation.
85
85
# maximize the loss. The resulting perturbed image, :math:`x'`, is then
86
86
# *misclassified* by the target network as a “gibbon” when it is still
87
87
# clearly a “panda”.
88
- #
88
+ #
89
89
# Hopefully now the motivation for this tutorial is clear, so lets jump
90
90
# into the implementation.
91
- #
91
+ #
92
92
93
93
import torch
94
94
import torch .nn as nn
102
102
######################################################################
103
103
# Implementation
104
104
# --------------
105
- #
105
+ #
106
106
# In this section, we will discuss the input parameters for the tutorial,
107
107
# define the model under attack, then code the attack and run some tests.
108
- #
108
+ #
109
109
# Inputs
110
110
# ~~~~~~
111
- #
111
+ #
112
112
# There are only three inputs for this tutorial, and are defined as
113
113
# follows:
114
- #
114
+ #
115
115
# - ``epsilons`` - List of epsilon values to use for the run. It is
116
116
# important to keep 0 in the list because it represents the model
117
117
# performance on the original test set. Also, intuitively we would
118
118
# expect the larger the epsilon, the more noticeable the perturbations
119
119
# but the more effective the attack in terms of degrading model
120
120
# accuracy. Since the data range here is :math:`[0,1]`, no epsilon
121
121
# value should exceed 1.
122
- #
122
+ #
123
123
# - ``pretrained_model`` - path to the pretrained MNIST model which was
124
124
# trained with
125
125
# `pytorch/examples/mnist <https://github.com/pytorch/examples/tree/master/mnist>`__.
126
126
# For simplicity, download the pretrained model `here <https://drive.google.com/file/d/1HJV2nUHJqclXQ8flKvcWmjZ-OU5DGatl/view?usp=drive_link>`__.
127
- #
127
+ #
128
128
# - ``use_cuda`` - boolean flag to use CUDA if desired and available.
129
129
# Note, a GPU with CUDA is not critical for this tutorial as a CPU will
130
130
# not take much time.
131
- #
131
+ #
132
132
133
133
epsilons = [0 , .05 , .1 , .15 , .2 , .25 , .3 ]
134
134
pretrained_model = "data/lenet_mnist_model.pth"
140
140
######################################################################
141
141
# Model Under Attack
142
142
# ~~~~~~~~~~~~~~~~~~
143
- #
143
+ #
144
144
# As mentioned, the model under attack is the same MNIST model from
145
145
# `pytorch/examples/mnist <https://github.com/pytorch/examples/tree/master/mnist>`__.
146
146
# You may train and save your own MNIST model or you can download and use
147
147
# the provided model. The *Net* definition and test dataloader here have
148
148
# been copied from the MNIST example. The purpose of this section is to
149
149
# define the model and dataloader, then initialize the model and load the
150
150
# pretrained weights.
151
- #
151
+ #
152
152
153
153
# LeNet Model definition
154
154
class Net (nn .Module ):
@@ -181,7 +181,7 @@ def forward(self, x):
181
181
datasets .MNIST ('../data' , train = False , download = True , transform = transforms .Compose ([
182
182
transforms .ToTensor (),
183
183
transforms .Normalize ((0.1307 ,), (0.3081 ,)),
184
- ])),
184
+ ])),
185
185
batch_size = 1 , shuffle = True )
186
186
187
187
# Define what device we are using
@@ -201,20 +201,20 @@ def forward(self, x):
201
201
######################################################################
202
202
# FGSM Attack
203
203
# ~~~~~~~~~~~
204
- #
204
+ #
205
205
# Now, we can define the function that creates the adversarial examples by
206
206
# perturbing the original inputs. The ``fgsm_attack`` function takes three
207
207
# inputs, *image* is the original clean image (:math:`x`), *epsilon* is
208
208
# the pixel-wise perturbation amount (:math:`\epsilon`), and *data_grad*
209
209
# is gradient of the loss w.r.t the input image
210
210
# (:math:`\nabla_{x} J(\mathbf{\theta}, \mathbf{x}, y)`). The function
211
211
# then creates perturbed image as
212
- #
212
+ #
213
213
# .. math:: perturbed\_image = image + epsilon*sign(data\_grad) = x + \epsilon * sign(\nabla_{x} J(\mathbf{\theta}, \mathbf{x}, y))
214
- #
214
+ #
215
215
# Finally, in order to maintain the original range of the data, the
216
216
# perturbed image is clipped to range :math:`[0,1]`.
217
- #
217
+ #
218
218
219
219
# FGSM attack code
220
220
def fgsm_attack (image , epsilon , data_grad ):
@@ -244,14 +244,14 @@ def denorm(batch, mean=[0.1307], std=[0.3081]):
244
244
mean = torch .tensor (mean ).to (device )
245
245
if isinstance (std , list ):
246
246
std = torch .tensor (std ).to (device )
247
-
247
+
248
248
return batch * std .view (1 , - 1 , 1 , 1 ) + mean .view (1 , - 1 , 1 , 1 )
249
249
250
250
251
251
######################################################################
252
252
# Testing Function
253
253
# ~~~~~~~~~~~~~~~~
254
- #
254
+ #
255
255
# Finally, the central result of this tutorial comes from the ``test``
256
256
# function. Each call to this test function performs a full test step on
257
257
# the MNIST test set and reports a final accuracy. However, notice that
@@ -264,7 +264,7 @@ def denorm(batch, mean=[0.1307], std=[0.3081]):
264
264
# if the perturbed example is adversarial. In addition to testing the
265
265
# accuracy of the model, the function also saves and returns some
266
266
# successful adversarial examples to be visualized later.
267
- #
267
+ #
268
268
269
269
def test ( model , device , test_loader , epsilon ):
270
270
@@ -338,15 +338,15 @@ def test( model, device, test_loader, epsilon ):
338
338
######################################################################
339
339
# Run Attack
340
340
# ~~~~~~~~~~
341
- #
341
+ #
342
342
# The last part of the implementation is to actually run the attack. Here,
343
343
# we run a full test step for each epsilon value in the *epsilons* input.
344
344
# For each epsilon we also save the final accuracy and some successful
345
345
# adversarial examples to be plotted in the coming sections. Notice how
346
346
# the printed accuracies decrease as the epsilon value increases. Also,
347
347
# note the :math:`\epsilon=0` case represents the original test accuracy,
348
348
# with no attack.
349
- #
349
+ #
350
350
351
351
accuracies = []
352
352
examples = []
@@ -361,10 +361,10 @@ def test( model, device, test_loader, epsilon ):
361
361
######################################################################
362
362
# Results
363
363
# -------
364
- #
364
+ #
365
365
# Accuracy vs Epsilon
366
366
# ~~~~~~~~~~~~~~~~~~~
367
- #
367
+ #
368
368
# The first result is the accuracy versus epsilon plot. As alluded to
369
369
# earlier, as epsilon increases we expect the test accuracy to decrease.
370
370
# This is because larger epsilons mean we take a larger step in the
@@ -375,7 +375,7 @@ def test( model, device, test_loader, epsilon ):
375
375
# lower than :math:`\epsilon=0.15`. Also, notice the accuracy of the model
376
376
# hits random accuracy for a 10-class classifier between
377
377
# :math:`\epsilon=0.25` and :math:`\epsilon=0.3`.
378
- #
378
+ #
379
379
380
380
plt .figure (figsize = (5 ,5 ))
381
381
plt .plot (epsilons , accuracies , "*-" )
@@ -390,7 +390,7 @@ def test( model, device, test_loader, epsilon ):
390
390
######################################################################
391
391
# Sample Adversarial Examples
392
392
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
393
- #
393
+ #
394
394
# Remember the idea of no free lunch? In this case, as epsilon increases
395
395
# the test accuracy decreases **BUT** the perturbations become more easily
396
396
# perceptible. In reality, there is a tradeoff between accuracy
@@ -403,7 +403,7 @@ def test( model, device, test_loader, epsilon ):
403
403
# perturbations start to become evident at :math:`\epsilon=0.15` and are
404
404
# quite evident at :math:`\epsilon=0.3`. However, in all cases humans are
405
405
# still capable of identifying the correct class despite the added noise.
406
- #
406
+ #
407
407
408
408
# Plot several examples of adversarial samples at each epsilon
409
409
cnt = 0
@@ -426,7 +426,7 @@ def test( model, device, test_loader, epsilon ):
426
426
######################################################################
427
427
# Where to go next?
428
428
# -----------------
429
- #
429
+ #
430
430
# Hopefully this tutorial gives some insight into the topic of adversarial
431
431
# machine learning. There are many potential directions to go from here.
432
432
# This attack represents the very beginning of adversarial attack research
@@ -438,7 +438,7 @@ def test( model, device, test_loader, epsilon ):
438
438
# on defense also leads into the idea of making machine learning models
439
439
# more *robust* in general, to both naturally perturbed and adversarially
440
440
# crafted inputs.
441
- #
441
+ #
442
442
# Another direction to go is adversarial attacks and defense in different
443
443
# domains. Adversarial research is not limited to the image domain, check
444
444
# out `this <https://arxiv.org/pdf/1801.01944.pdf>`__ attack on
@@ -447,4 +447,8 @@ def test( model, device, test_loader, epsilon ):
447
447
# implement a different attack from the NIPS 2017 competition, and see how
448
448
# it differs from FGSM. Then, try to defend the model from your own
449
449
# attacks.
450
- #
450
+ #
451
+ # A further direction to go, depending on available resources, is to modify
452
+ # the code to support processing work in batch, in parallel, and or distributed
453
+ # vs working on one attack at a time in the above for each ``epsilon test()`` loop.
454
+ #
0 commit comments