DeepXplore: Automated Whitebox Testing of Deep ...

Liang Gong
March 14, 2018

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Liang Gong

March 14, 2018

  1. Presented by Liang Gong DeepXplore: Automated Whitebox Testing of Deep

    Learning Systems Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.
  2. Motivation Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors for corner cases? 2
  4. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test traditional software to expose erroneous behaviors for corner cases? Software System Test Input Test Output • concolic execution • random testing • coverage-guided fuzz testing • … 4
  7. Their Solution Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Test Input Test Output Research Questions: • How to define coverage for DL system? • What is differential-guided? • How to fuzz test input based on those metrics? • How to get test oracle? coverage-guided & differential-guided fuzz testing 7
  9. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • Coverage based on the # of activated neurons Neurons often correspond to self- extracted features at different levels. My Comment: activating neurons  triggering conditionals in programs. 9
  11. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t N 11
  13. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 13
  15. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 15
  18. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 18
  19. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 19
  20. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 20
  21. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 21
  23. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 23
  25. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 25
  27. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 28
  28. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 29
  29. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 30
  30. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 31
  31. Input x Walk through the Algorithm C1  0.3 C2

     0.6 C  0.1 C1  0.05 C2  0.05 C  0.9 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 32
  32. Dataset: Contagio/VirusTotal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Dataset of benign and malicious PDF documents • 5000 benign PDF files • 12,205 malicious PDF files Extract 135 static features as DNN input DNN Model Variations: • 1 input layer • 2 - 4 fully connect layers • 1 softmax output layer (benign or malicious) 34
  33. Dataset: Drebin Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Dataset of benign and malicious Android apps • 123,453 benign apps • 5,560 malicious apps Extract 545,333 binary features as DNN input DNN Model Variations: • … 35
  Domain-specific Constraints
Goal: generate more realistic input

    of California, Berkeley. Goal: generate more realistic input 36
  Domain-specific Constraints
Input x grad = constraint( ) Goal: generate more realistic input

    Electric Engineering & Computer Science, University of California, Berkeley. Goal: generate more realistic input 37
  #1 Simulate Different Lighting Conditions
grad = constraint( )

    Computer Science, University of California, Berkeley. grad = constraint( ) 38
  #1 Simulate Different Lighting Conditions
grad = mean( )

    Computer Science, University of California, Berkeley. 39 grad = mean( )
  #1 Simulate Different Lighting Conditions
grad = constraint( )

    Computer Science, University of California, Berkeley. grad = constraint( ) 40
  39. #2 Simulate Attacks by Masking Ivan Evtimov, Kevin Eykholt, Earlence

    Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song: Robust Physical-World Attacks on Machine Learning Models. CoRR abs/1707.08945 (2017)
  #2 Simulate Attacks by Masking
pick a m x n rectangle at a random position and patch Cameras

    Berkeley. 42 pick a m x n rectangle at a random position and patch #2 Simulate Attacks by Masking Cameras
  #2 Simulate Attacks by Masking Cameras
grad = constraint( )

    & Computer Science, University of California, Berkeley. grad = constraint( ) 43
  #3 Simulate Dirt on Cameras
pick a m x m rectangle at a random position and patch if mean of gradient > 0

    Berkeley. 44 pick a m x m rectangle at a random position and patch if mean of gradient > 0 #3 Simulate Dirt on Cameras
  #3 Simulate Dirt on Cameras
grad = constraint( )

    Computer Science, University of California, Berkeley. grad = constraint( ) 45
  44. #4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer

    Science, University of California, Berkeley. grad = constraint( ) For Android/PDF malware dataset : • turning binary features from 0 to 1 • add features (add permissions in the manifest files) • Deleting features (1 to 0) is not allowed • ensure no functionality changes due to insufficient permissions 46
  #4 Simulate Relaxing Permissions
Malware incorrectly classified as benign after adding permissions in the input.

    Science, University of California, Berkeley. Malware incorrectly classified as benign after adding permissions in the input. 47
  46. Effects of Neuron Coverage (NC) Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. L1 distance between generated inputs Higher numbers are better. Neuron Coverage # of difference inducing inputs Optimize without NC Optimize with NC 50
  Time to Get the 1st Counter Example
# of seed inputs Lower numbers are better.

    Engineering & Computer Science, University of California, Berkeley. # of seed inputs Lower numbers are better. 51
  Counter Examples for Improving DNN
Higher is better.

    Computer Science, University of California, Berkeley. Higher is better. 52
  49. More Diff in Models  Harder to find fault MNIST

    training set (60,000 samples) and LeNet-1 trained with 10 epochs as the control group