DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Presented by Liang Gong DeepXplore: Automated Whitebox Testing of Deep
Learning Systems Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.

Motivation Liang Gong, Electric Engineering & Computer Science, University of
California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors for corner cases? 2

Motivation Liang Gong, Electric Engineering & Computer Science, University of
California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors of corner cases? 3

Research Goal Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. • How to test traditional software to expose erroneous behaviors for corner cases? Software System Test Input Test Output • concolic execution • random testing • coverage-guided fuzz testing • … 4

of California, Berkeley. • How to test DL systems to expose erroneous behaviors for corner cases? Test Input Test Output • concolic execution? • random testing? • coverage-guided fuzz testing? • … 5

of California, Berkeley. • How to test DL systems to expose erroneous behaviors for corner cases? Test Input Test Output Their solution • coverage-guided & differential-guided Testing 6

Their Solution Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. Test Input Test Output Research Questions: • How to define coverage for DL system? • What is differential-guided? • How to fuzz test input based on those metrics? • How to get test oracle? coverage-guided & differential-guided fuzz testing 7

Research Questions Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. Test Input Test Output • LOC coverage? ~100% • Coverage based on the # of neurons processed? 100% • Coverage based on the # of neurons activated? Interesting… How to define coverage for DL system? 8

of California, Berkeley. How to define coverage for DL system? • Coverage based on the # of activated neurons Neurons often correspond to self- extracted features at different levels. My Comment: activating neurons  triggering conditionals in programs. 9

of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t 10

of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t N 11

of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? • How to generate input based on feedback? 12

of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 13

of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing  Max the diff! Test Input Test Output Test Output If different, one NN might be wrong. 17

of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 18

Guided Testing as Optimization Problem Liang Gong, Electric Engineering &
Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 21

Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 22

Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 23

Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 24

Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 25

Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 26

Algorithm: Input: Objective function: Liang Gong, Electric Engineering & Computer
Science, University of California, Berkeley. 27

Input x Walk through the Algorithm C1  0.1 C2
 0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 28

 0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 29

 0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 30

 0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 31

 0.6 C  0.1 C1  0.05 C2  0.05 C  0.9 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 32

Evaluation Dataset Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. 33

Dataset: Contagio/VirusTotal Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. Dataset of benign and malicious PDF documents • 5000 benign PDF files • 12,205 malicious PDF files Extract 135 static features as DNN input DNN Model Variations: • 1 input layer • 2 - 4 fully connect layers • 1 softmax output layer (benign or malicious) 34

Dataset: Drebin Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. Dataset of benign and malicious Android apps • 123,453 benign apps • 5,560 malicious apps Extract 545,333 binary features as DNN input DNN Model Variations: • … 35

Domain-specific Constraints Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. Goal: generate more realistic input 36

Input x Domain-specific Constraints grad = constraint( ) Liang Gong,
Electric Engineering & Computer Science, University of California, Berkeley. Goal: generate more realistic input 37

#1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &
Computer Science, University of California, Berkeley. grad = constraint( ) 38

Computer Science, University of California, Berkeley. 39 grad = mean( )

#2 Simulate Attacks by Masking Ivan Evtimov, Kevin Eykholt, Earlence
Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song: Robust Physical-World Attacks on Machine Learning Models. CoRR abs/1707.08945 (2017)

Liang Gong, Electric Engineering & Computer Science, University of California,
Berkeley. 42 pick a m x n rectangle at a random position and patch #2 Simulate Attacks by Masking Cameras

#2 Simulate Attacks by Masking Cameras Liang Gong, Electric Engineering
& Computer Science, University of California, Berkeley. grad = constraint( ) 43

Liang Gong, Electric Engineering & Computer Science, University of California,
Berkeley. 44 pick a m x m rectangle at a random position and patch if mean of gradient > 0 #3 Simulate Dirt on Cameras

#3 Simulate Dirt on Cameras Liang Gong, Electric Engineering &

#4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer
Science, University of California, Berkeley. grad = constraint( ) For Android/PDF malware dataset : • turning binary features from 0 to 1 • add features (add permissions in the manifest files) • Deleting features (1 to 0) is not allowed • ensure no functionality changes due to insufficient permissions 46

#4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer
Science, University of California, Berkeley. Malware incorrectly classified as benign after adding permissions in the input. 47

DNNs and Datasets Liang Gong, Electric Engineering & Computer Science,
University of California, Berkeley. 48

Effects of Neuron Coverage (NC) Classification of different classes 
activating different set of neurons

Effects of Neuron Coverage (NC) Liang Gong, Electric Engineering &
Computer Science, University of California, Berkeley. L1 distance between generated inputs Higher numbers are better. Neuron Coverage # of difference inducing inputs Optimize without NC Optimize with NC 50

Time to Get the 1st Counter Example Liang Gong, Electric
Engineering & Computer Science, University of California, Berkeley. # of seed inputs Lower numbers are better. 51

Counter Examples for Improving DNN Liang Gong, Electric Engineering &
Computer Science, University of California, Berkeley. Higher is better. 52

More Diff in Models  Harder to find fault MNIST
training set (60,000 samples) and LeNet-1 trained with 10 epochs as the control group

DeepXplore: Automated Whitebox Testing of Deep ...

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

More Decks by Liang Gong

Other Decks in Research

Featured

Transcript