DL Ϟσϧ https://arxiv.org/abs/1412.6572 : తؔͷඍํʹඍখྔΛੵΈ্͛Δ
ϊΠζ෦Λղ: ϊΠζ η Λ࣍ࣜ (fast gradient sign method) Ͱࢉग़: ε ඍখͳఆɺθ Ϟσϧύϥϝλɺx ೖྗը૾ɺy ϥϕϧ https://arxiv.org/abs/1511.04599 : ࣝผڥքΛ·͕ͨΔํͷมҐΛগ্ͣͭ͛͠͠Δ ؤڧੑΛఆٛ: ೋྨͷΞϧΰϦζϜʢଟྨ֦ுՄೳʣ: The minimizer r might not be unique, but we denote one such x + r for minimizer by D(x, l) . Informally, x + r is the closest image to x classified D(x, f(x)) = f(x) , so this task is non-trivial only if f(x) 6 = l . In general, of D(x, l) is a hard problem, so we approximate it by using a box-constrained we find an approximation of D(x, l) by performing line-search to find the min the minimizer r of the following problem satisfies f(x + r) = l . • Minimize c | r | + loss f (x + r, l) subject to x + r 2 [0, 1]m This penalty function method would yield the exact solution for D(X, l) i losses, however neural networks are non-convex in general, so we end up wit this case. 4.2 Experimental results Our “minimimum distortion” function D has the following intriguing properti port by informal evidence and quantitative experiments in this section: 1. For all the networks we studied (MNIST, QuocNet [10], AlexNe ple, we have always managed to generate very close, visually ha versarial examples that are misclassified by the original netwo http://goo.gl/huaGPb for examples). 2. Cross model generalization: a relatively large fraction of examples w networks trained from scratch with different hyper-parameters (num ization or initial weights). 3. Cross training-set generalization a relatively large fraction of examp HE LINEAR EXPLANATION OF ADVERSARIAL EXAMPLES t with explaining the existence of adversarial examples for linear models. y problems, the precision of an individual input feature is limited. For example, digital often use only 8 bits per pixel so they discard all information below 1 / 255 of the dynamic Because the precision of the features is limited, it is not rational for the classifier to respond ntly to an input x than to an adversarial input ˜ x = x + ⌘ if every element of the perturbation aller than the precision of the features. Formally, for problems with well-separated classes, ect the classifier to assign the same class to x and ˜ x so long as || ⌘ ||1 < ✏, where ✏ is small to be discarded by the sensor or data storage apparatus associated with our problem. er the dot product between a weight vector w and an adversarial example ˜ x : w > ˜ x = w > x + w > ⌘ . versarial perturbation causes the activation to grow by w > ⌘ .We can maximize this increase to the max norm constraint on ⌘ by assigning ⌘ = sign (w) . If w has n dimensions and the magnitude of an element of the weight vector is m, then the activation will grow by ✏mn. |⌘||1 does not grow with the dimensionality of the problem but the change in activation by perturbation by ⌘ can grow linearly with n, then for high dimensional problems, we can many infinitesimal changes to the input that add up to one large change to the output. We nk of this as a sort of “accidental steganography,” where a linear model is forced to attend vely to the signal that aligns most closely with its weights, even if multiple signals are present er signals have much greater amplitude. planation shows that a simple linear model can have adversarial examples if its input has suf- dimensionality. Previous explanations for adversarial examples invoked hypothesized prop- f neural networks, such as their supposed highly non-linear nature. Our hypothesis based arity is simpler, and can also explain why softmax regression is vulnerable to adversarial es. NEAR PERTURBATION OF NON-LINEAR MODELS 57.7% confidence 8.2% confidence 99.3 % Figure 1: A demonstration of fast adversarial example generation applied to G et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose el the sign of the elements of the gradient of the cost function with respect to the in GoogLeNet’s classification of the image. Here our ✏ of .007 corresponds to th smallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real nu Let ✓ be the parameters of a model, x the input to the model, y the targets ass machine learning tasks that have targets) and J (✓ , x , y ) be the cost used to train We can linearize the cost function around the current value of ✓ , obtaining an constrained pertubation of ⌘ = ✏sign ( r x J (✓ , x , y )) . We refer to this as the “fast gradient sign method” of generating adversarial exam required gradient can be computed efficiently using backpropagation. We find that this method reliably causes a wide variety of models to misclass Fig. 1 for a demonstration on ImageNet. We find that using ✏ = . 25 , we cause classifier to have an error rate of 99.9% with an average confidence of 79.3% on set1. In the same setting, a maxout network misclassifies 89.4% of our advers an average confidence of 97.6%. Similarly, using ✏ = . 1 , we obtain an error an average probability of 96.6% assigned to the incorrect labels when using a co network on a preprocessed version of the CIFAR-10 (Krizhevsky & Hinton, 20 simple methods of generating adversarial examples are possible. For example, rotating x by a small angle in the direction of the gradient reliably produces adv ably quantify the robustness of these classifiers. Extensive experimental results show that our approach outperforms recent methods in the task of computing adversarial pertur- bations and making classifiers more robust.1 1. Introduction Deep neural networks are powerful learning models that achieve state-of-the-art pattern recognition performance in many research areas such as bioinformatics [1, 16], speech [12, 6], and computer vision [10, 8]. Though deep net- works have exhibited very good performance in classifica- tion tasks, they have recently been shown to be particularly unstable to adversarial perturbations of the data [18]. In fact, very small and often imperceptible perturbations of the data samples are sufficient to fool state-of-the-art classifiers and result in incorrect classification. (e.g., Figure 1). For- mally, for a given classifier, we define an adversarial per- turbation as the minimal perturbation r that is sufficient to change the estimated label ˆ k (x) : (x; ˆ k ) := min r k r k2 subject to ˆ k (x + r) 6 = ˆ k (x) , (1) where x is an image and ˆ k (x) is the estimated label. We call (x; ˆ k ) the robustness of ˆ k at point x . The robustness of classifier ˆ k is then defined as 1To encourage reproducible research, the code of DeepFool is made available at http://github.com/lts4/deepfool Figure 1: An example of adversarial First row: the original image x that i ˆ k (x) =“whale”. Second row: the image x as ˆ k (x + r) =“turtle” and the corresponding computed by DeepFool. Third row: the im as “turtle” and the corresponding perturba by the fast gradient sign method [4]. Deep smaller perturbation. arXiv:1511.04599v3 [c F f( x ) < 0 f( x ) > 0 r⇤ ( x ) (x 0 ;f) x0 Figure 2: Adversarial examples for a linear binary classifier. Algorithm 1 DeepFool for binary classifiers 1: input: Image x , classifier f. 2: output: Perturbation ˆ r . 3: Initialize x0 x , i 0 . 4: while sign ( f (xi)) = sign ( f (x0)) do 5: ri f ( xi) kr f ( xi)k2 2 rf (xi) , 6: xi +1 xi + ri , 7: i i + 1 . 8: end while 9: return ˆ r = P i ri . 13/32