Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DeepXplore: Automated Whitebox Testing of Deep ...

Avatar for Liang Gong Liang Gong
March 14, 2018

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Avatar for Liang Gong

Liang Gong

March 14, 2018
Tweet

More Decks by Liang Gong

Other Decks in Research

Transcript

  1. Presented by Liang Gong DeepXplore: Automated Whitebox Testing of Deep

    Learning Systems Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.
  2. Motivation Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors for corner cases? 2
  3. Motivation Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Background: • Deep learning systems are increasingly used: • Safety-critical: self-driving cars • Security-critical: malware detection Problem: • How to test DL systems to expose erroneous behaviors of corner cases? 3
  4. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test traditional software to expose erroneous behaviors for corner cases? Software System Test Input Test Output • concolic execution • random testing • coverage-guided fuzz testing • … 4
  5. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test DL systems to expose erroneous behaviors for corner cases? Test Input Test Output • concolic execution? • random testing? • coverage-guided fuzz testing? • … 5
  6. Research Goal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. • How to test DL systems to expose erroneous behaviors for corner cases? Test Input Test Output Their solution • coverage-guided & differential-guided Testing 6
  7. Their Solution Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Test Input Test Output Research Questions: • How to define coverage for DL system? • What is differential-guided? • How to fuzz test input based on those metrics? • How to get test oracle? coverage-guided & differential-guided fuzz testing 7
  8. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Test Input Test Output • LOC coverage? ~100% • Coverage based on the # of neurons processed? 100% • Coverage based on the # of neurons activated? Interesting… How to define coverage for DL system? 8
  9. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • Coverage based on the # of activated neurons Neurons often correspond to self- extracted features at different levels. My Comment: activating neurons  triggering conditionals in programs. 9
  10. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t 10
  11. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. How to define coverage for DL system? • coverage based on the # of activated neurons • activating neurons  triggering conditionals in programs All neurons: All inputs: Output of neuron n given input x : Threshold for activation: t N 11
  12. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? • How to generate input based on feedback? 12
  13. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 13
  14. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 14
  15. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 15
  16. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing Test Input Test Output Test Output If different, one NN might be wrong. 16
  17. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. • How to detect erroneous output? (oracle problem) Neural networks rarely crash… Key Idea: differential testing  Max the diff! Test Input Test Output Test Output If different, one NN might be wrong. 17
  18. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 18
  19. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 19
  20. Research Questions Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Think of it as feedback-directed fuzz testing. Now, given different DNNs, we want to generate the next input that is: • Coverage-guided: Maximize the neuron activation • Differential-guided: Maximize the diff of NN outputs Research Question: How to guide the input generation based on those metrics? • an optimization problem: 20
  21. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 21
  22. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label Very similar problem! So we can reuse gradient descent and back propagation with a few modifications. 22
  23. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 23
  24. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Loss based on: | y1 – y2 | + output of inactivated neurons • Maximize the loss NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Loss based on: | y – y | • Minimize the loss Modify the objective (loss function) Gradient ascend 24
  25. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 25
  26. Guided Testing as Optimization Problem Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. NN Test Generation • Fix NN parameters • Adjust input • Maximize coverage + diff • Differentiate w.r.t. input • Add delta to input NN Training • Fix input • Adjust NN parameters • Minimize diff between output and label • Differentiate w.r.t. weights • Add delta to weights Modify the gradient (differentiation equation) 26
  27. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 28
  28. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 29
  29. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. Maximize the diff Maximize the cov Sum of outputs of inactive neurons Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 30
  30. Input x Walk through the Algorithm C1  0.1 C2

     0.1 C  0.8 C1  0.15 C2  0.1 C  0.75 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 31
  31. Input x Walk through the Algorithm C1  0.3 C2

     0.6 C  0.1 C1  0.05 C2  0.05 C  0.9 Objective function: Let’s diff this NN from the others. grad = Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 32
  32. Dataset: Contagio/VirusTotal Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Dataset of benign and malicious PDF documents • 5000 benign PDF files • 12,205 malicious PDF files Extract 135 static features as DNN input DNN Model Variations: • 1 input layer • 2 - 4 fully connect layers • 1 softmax output layer (benign or malicious) 34
  33. Dataset: Drebin Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Dataset of benign and malicious Android apps • 123,453 benign apps • 5,560 malicious apps Extract 545,333 binary features as DNN input DNN Model Variations: • … 35
  34. Domain-specific Constraints Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. Goal: generate more realistic input 36
  35. Input x Domain-specific Constraints grad = constraint( ) Liang Gong,

    Electric Engineering & Computer Science, University of California, Berkeley. Goal: generate more realistic input 37
  36. #1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. grad = constraint( ) 38
  37. #1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. 39 grad = mean( )
  38. #1 Simulate Different Lighting Conditions Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. grad = constraint( ) 40
  39. #2 Simulate Attacks by Masking Ivan Evtimov, Kevin Eykholt, Earlence

    Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song: Robust Physical-World Attacks on Machine Learning Models. CoRR abs/1707.08945 (2017)
  40. Liang Gong, Electric Engineering & Computer Science, University of California,

    Berkeley. 42 pick a m x n rectangle at a random position and patch #2 Simulate Attacks by Masking Cameras
  41. #2 Simulate Attacks by Masking Cameras Liang Gong, Electric Engineering

    & Computer Science, University of California, Berkeley. grad = constraint( ) 43
  42. Liang Gong, Electric Engineering & Computer Science, University of California,

    Berkeley. 44 pick a m x m rectangle at a random position and patch if mean of gradient > 0 #3 Simulate Dirt on Cameras
  43. #3 Simulate Dirt on Cameras Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. grad = constraint( ) 45
  44. #4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer

    Science, University of California, Berkeley. grad = constraint( ) For Android/PDF malware dataset : • turning binary features from 0 to 1 • add features (add permissions in the manifest files) • Deleting features (1 to 0) is not allowed • ensure no functionality changes due to insufficient permissions 46
  45. #4 Simulate Relaxing Permissions Liang Gong, Electric Engineering & Computer

    Science, University of California, Berkeley. Malware incorrectly classified as benign after adding permissions in the input. 47
  46. Effects of Neuron Coverage (NC) Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. L1 distance between generated inputs Higher numbers are better. Neuron Coverage # of difference inducing inputs Optimize without NC Optimize with NC 50
  47. Time to Get the 1st Counter Example Liang Gong, Electric

    Engineering & Computer Science, University of California, Berkeley. # of seed inputs Lower numbers are better. 51
  48. Counter Examples for Improving DNN Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. Higher is better. 52
  49. More Diff in Models  Harder to find fault MNIST

    training set (60,000 samples) and LeNet-1 trained with 10 epochs as the control group