Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Covering a function using a Dynamic Symbolic Ex...

Covering a function using a Dynamic Symbolic Execution approach

This talk is about binary analysis and instrumentation. We will see how it's possible to target a specific function, snapshot the context memory/registers before the function, translate the instrumentation into an intermediate representation,apply a taint analysis based on this IR, build/keep formulas for a Dynamic Symbolic Execution (DSE), generate a concrete value to go through a specific path, restore the context memory/register and generate another concrete value to go through another path then repeat this operation until the target function is covered.

Jonathan Salwan

January 16, 2015
Tweet

More Decks by Jonathan Salwan

Other Decks in Research

Transcript

  1. Dynamic Binary Analysis and Instrumentation Covering a function using a

    DSE approach Jonathan Salwan [email protected] Security Day Lille – France January 16, 2015 Keywords : Program analysis, DBI, Pin, concrete execution, symbolic execution, DSE, taint analysis, context snapshot and Z3 theorem prover.
  2. 2 • I am a junior security researcher at Quarkslab

    • I have a strong interest in all low level computing and program analysis • I like to play with weird things even though it's useless Who am I?
  3. 3 • Introduction – Dynamic binary instrumentation – Data flow

    analysis – Symbolic execution – Theorem prover • Code coverage using a DSE approach – Objective – Snapshot – Registers and memory references – Rebuild the trace with the backward analysis roadmap of this talk • DSE example • Demo • Some words about vulnerability finding • Conclusion
  4. 6 • A DBI is a way to execute an

    external code before or/and after each instruction/routine • With a DBI you can: – Analyze the binary execution step-by-step • Context memory • Context registers – Only analyze the executed code Dynamic Binary Instrumentation
  5. 7 • How does it work in a nutshell? Dynamic

    Binary Instrumentation initial_instruction_1 initial_instruction_2 initial_instruction_3 initial_instruction_4 jmp_call_back_before initial_instruction_1 jmp_call_back_after jmp_call_back_before initial_instruction_2 jmp_call_back_after jmp_call_back_before initial_instruction_3 jmp_call_back_after jmp_call_back_before initial_instruction_4 jmp_call_back_after
  6. 8 • Developed by Intel – Pin is a dynamic

    binary instrumentation framework for the IA-32 and x86-64 instruction-set architectures – The tools created using Pin, called Pintools, can be used to perform program analysis on user space applications in Linux and Windows Pin
  7. 9 • Example of a provided tool: ManualExamples/inscount1.cpp – Count

    the number of instructions executed Pin tool - example VOID docount(UINT32 c) { icount += c; } VOID Trace(TRACE trace, VOID *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)){ BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } int main(int argc, char * argv[]) { ... TRACE_AddInstrumentFunction(Trace, 0); }
  8. 10 • Dynamic binary instrumentation overloads the initial execution –

    The overload is even more if we send the context in our callback Dynamic Binary Instrumentation /usr/bin/id /usr/bin/uname 0 2 4 6 8 10 12 0.01 0.01 0.57 0.35 10.52 6.66 DBI Overloads with Pin Initial execution DBI without context DBI with context Programs Time (s)
  9. 11 • Instrumenting a binary in its totality is unpractical

    due to the overloads – That's why we need to target our instrumentation • On a specific area • On a specific function and its subroutines – Don't instrument something that you don't want • Ex: A routine in a library – strlen, strcpy, … • We already know these semantics and can predict the return value with the input value Dynamic Binary Instrumentation
  10. 12 • Target the areas which need to be instrumented

    Dynamic Binary Instrumentation Control flow graph Call graph ρin ρ1 ρ2 ρout ρ3 fin f2 f1 f3 f5 f4 fout
  11. 14 • Gather information about the possible set of values

    calculated at various points • Follow the spread of variables through the execution • There are several kinds of data flow analysis: – Liveness analysis – Range analysis – Taint analysis • Determine which bytes in memory can be controlled by the user ( ) Data Flow Analysis Memory
  12. 15 • Define areas which need to be tagged as

    controllable – For us, this is the environment Pin and Data Flow Analysis int main(int argc, const char *argv[], const char *env[]) {...} – And syscalls read(fd, buffer, size) Example with sys_read() → For all “byte” in [buffer, buffer+size-1] (Taint(byte))
  13. 16 Pin and Data Flow Analysis • Then, spread the

    taint by monitoring all instructions which read (LOAD) or write (STORE) in the tainted area if (INS_MemoryOperandIsRead(ins, 0) && INS_OperandIsReg(ins, 0)){ INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ReadMem, IARG_MEMORYOP_EA, 0, IARG_UINT32,INS_MemoryReadSize(ins), IARG_END); } if (INS_MemoryOperandIsWritten(ins, 0)){ INS_InsertCall( ins, IPOINT_BEFORE,( (AFUNPTR)WriteMem, IARG_MEMORYOP_EA, 0, IARG_UINT32,INS_MemoryWriteSize(ins), IARG_END); } mov regA, [regB] mov [regA], regB.
  14. 17 Pin and Data Flow Analysis • Tainting the memory

    areas is not enough, we must also taint the registers. – More accuracy by tainting the bits • Increases the analysis's time
  15. 18 Pin and Data Flow Analysis • So, by monitoring

    all STORE/LOAD and GET/PUT instructions, we know at every program points, which registers or memory areas are controlled by the user
  16. 20 • Symbolic Execution • Symbolic execution is the execution

    of the program using symbolic variables instead of concrete values • Symbolic execution translates the program's semantic into a logical formula • Symbolic execution can build and keep a path formula – By solving the formula, we can take all paths and “cover” a code • Instead of concrete execution which takes only one path • Then a symbolic expression is given to a theorem prover to generate a concrete value
  17. 21 • Symbolic Execution • There exists two kinds of

    symbolic execution – Static Symbolic Execution (SSE) • Translates program statements into formulae – Mainly used to check if a statement represents the desired property – Dynamic Symbolic Execution (DSE) • Builds the formula at runtime step-by-step – Mainly used to know how a branch can be taken – Analyze only one path at a time
  18. 22 • Symbolic Execution Control flow graph ρout = φ1

    ( ∧ (pc φ ∧ 2 φ ∧ 3 ) ∨ ( ¬pc φ ∧ 2 ' φ ∧ 4 ) ) • Path formula – This control flow graph can take 2 different paths • What is the path formula for the ρout node? φ = statement / operation pc = path condition (guard) ρin ρ1 ρ2 ρ3 ρout φ1 φ2 ' φ2 φ3 φ4 pc (guard) ρout = ?
  19. 23 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } • SSE path formula and statement property
  20. 24 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: {True} [x1 = i1] • SSE path formula and statement property
  21. 25 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: {True} [x1 = i1, y1 = i2] • SSE path formula and statement property
  22. 26 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 > 80 ?} [x1 = i1, y1 = i2] • SSE path formula and statement property
  23. 27 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 > 80} [x2 = y1 * 2, y1 = i2] • SSE path formula and statement property
  24. 28 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 > 80} [x2 = y1 * 2, y2 = 0] • SSE path formula and statement property
  25. 29 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 > 80 x2 == 256 ?} [x2 = y1 * 2, y2 = 0] ∧ • SSE path formula and statement property
  26. 30 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 > 80 x2 == 256} [x2 = y1 * 2, y2 = 0] ∧ At this point φk can be taken iff (x1 > 80) (x2 == 256) ∧ • SSE path formula and statement property
  27. 31 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 <= 80} [x1 = i1, y1 = i2] • SSE path formula and statement property
  28. 32 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 <= 80} [x2 = 0, y1 = i2] • SSE path formula and statement property
  29. 33 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { x1 <= 80} [x2 = 0, y2 = 0] • SSE path formula and statement property
  30. 34 • Symbolic Execution int foo(int i1, int i2) {

    int x = i1; int y = i2; if (x > 80){ x = y * 2; y = 0; if (x == 256) return True; } else{ x = 0; y = 0; } /* ... */ return False; } PC: { (x1 <= 80) ∨ ((x1 > 80) (x2 != 256)) ∧ } [ (x2 = 0, y2 = 0) ∨ (x2 = y1 * 2, y2 = 0) ] • SSE path formula and statement property
  31. 35 • Symbolic Execution • With the DSE approach, we

    can only go through one single path at a time. Paths discovered at the 1st iteration
  32. 36 • Symbolic Execution • With the DSE approach, we

    can only go through one single path at a time. Paths discovered at the 2nd iteration
  33. 37 • Symbolic Execution • With the DSE approach, we

    can only go through one single path at a time. Paths discovered at the 3th iteration
  34. 38 • Symbolic Execution • With the DSE approach, we

    can only go through one single path at a time. Paths discovered at the 4th iteration
  35. 39 • Symbolic Execution • In this example, the DSE

    approach will iterate 3 times and keep the formula for all paths {T} {x <= 80} Out Iteration 1
  36. 40 • Symbolic Execution • In this example, the DSE

    approach will iterate 3 times and keep the formula for all paths {T} {x > 80} {(x > 80) (x != 256) ∧ } Out Iteration 2 {T} {x <= 80} Out Iteration 1
  37. 41 • Symbolic Execution • In this example, the DSE

    approach will iterate 3 times and keep the formula for all paths {T} {x > 80} {(x > 80) (x != 256) ∧ } Out Iteration 2 {T} {x <= 80} Out Iteration 1 {T} {x > 80} {(x > 80) (x == 256) ∧ } Out Iteration 3
  38. 43 • Used to prove if an equation is satisfiable

    or not – Example with a simple equation with two unknown values • Theorem Prover $ cat ./ex.py from z3 import * x = BitVec('x', 32) y = BitVec('y', 32) s = Solver() s.add((x ^ 0x55) + (3 - (y * 12)) == 0x30) s.check() print s.model() $ ./ex.py [x = 184, y = 16] • Check Axel's previous talk for more information about z3 and theorem prover
  39. 44 • Why in our case do we use a

    theorem prover? – To check if a path constraint (PC) can be solved and with which model – Example with the previous code (slide 22) • What value can hold the variable 'x' to take the “return false” path? • Theorem Prover >>> from z3 import * >>> x = BitVec('x', 32) >>> s = Solver() >>> s.add(Or(x <= 80, And(x > 80, x != 256))) >>> s.check() sat >>> s.model() [x = 0]
  40. 46 • Objective: Cover a function using a DSE approach

    • To do that, we will: 1. Target a function in memory 2. Setup the context snapshot on this function 3. Execute this function symbolically 4. Restore the context and take another path 5. Repeat this operation until the function is covered Objective?
  41. 47 • The objective is to cover the check_password function

    – Does covering the function mean finding the good password? • Yes, we can reach the return 0 only if we go through all loop iterations Objective? char *serial = "\x31\x3e\x3d\x26\x31"; int check_password(char *ptr) { int i = 0; while (i < 5){ if (((ptr[i] - 1) ^ 0x55) != serial[i]) return 1; /* bad password */ i++; } return 0; /* good password */ }
  42. 48 • Save the memory context and the register states

    (snapshot) • Taint the ptr argument (It is basically our 'x' of the formula) • Spread the taint and build the path constraints – An operation/statement is an instruction (noted φi ) • At the branch instruction, use a theorem prover to take the true or the false branch – In our case, the goal is to take the false branch (not the return 1) • Restore the memory context and the register states to take another path Roadmap
  43. 50 • Take a context snapshot at the beginning of

    the function • When the function returns, restore the initial context snapshot and go through another path • Repeat this operation until all the paths are taken Snapshot Restore context Snapshot Restore Snapshot Target function / bbl Dynamic Symbolic Execution Possible paths in the target
  44. 51 • Use PIN_SaveContext() to deal with the register states

    – Save_Context() only saves register states, not memory • We must monitor I/O memory Snapshot std::cout << "[snapshot]" << std::endl; PIN_SaveContext(ctx, &snapshot); std::cout << "[restore snapshot]" << std::endl; PIN_SaveContext(&snapshot, ctx); restoreMemory(); PIN_ExecuteAt(ctx); – Save context – Restore context
  45. 52 • The “restore memory” function looks like this: Snapshot

    VOID restoreMemory(void) { list<struct memoryInput>::iterator i; for(i = memInput.begin(); i != memInput.end(); ++i){ *(reinterpret_cast<ADDRINT*>(i->address)) = i->value; } memInput.clear(); } • The memoryInput list is filled by monitoring all the STORE instructions if (INS_OperandCount(ins) > 1 && INS_MemoryOperandIsWritten(ins, 0)){ INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)WriteMem, IARG_ADDRINT, INS_Address(ins), IARG_PTR, new string(INS_Disassemble(ins)), IARG_UINT32, INS_OperandCount(ins), IARG_UINT32, INS_OperandReg(ins, 1), IARG_MEMORYOP_EA, 0, IARG_END); }
  46. 54 • A symbolic trace is a sequence of semantic

    expressions T = ( E ⟦ 1 ⟧ ∧ E ⟦ 2 ⟧ ∧ E ⟦ 3 ⟧ ∧ E ⟦ 4 ⟧ ∧ … ∧ E ⟦ i ) ⟧ • Each expression E ⟦ i → SE ⟧ i (Symbolic Expression) • Each SE is translated like this: REFout = semantic – Where : • REFout := unique ID • Semantic := | REF ℤ in | <<op>> • A register points on its last reference. Basically, it is close to SSA (Single Static Assignment) but with semantics • Register references
  47. 55 • Register references mov eax, 1 add eax, 2

    mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : -1, EBX : -1, ECX : -1, ... } // Empty set Symbolic Expression Set { }
  48. 56 • Register references mov eax, 1 φ0 = 1

    add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ0, EBX : -1, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ0, 1> }
  49. 57 • Register references mov eax, 1 φ0 = 1

    add eax, 2 φ1 = add(φ0, 2) mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : -1, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ1, add(φ0, 2)>, <φ0, 1> }
  50. 58 • Register references mov eax, 1 φ0 = 1

    add eax, 2 φ1 = add(φ0, 2) mov ebx, eax φ2 = φ1 Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> }
  51. 59 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ?
  52. 60 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ? EBX holds the reference φ2
  53. 61 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ? EBX holds the reference φ2 What is φ2 ?
  54. 62 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ? EBX holds the reference φ2 What is φ2 ? Reconstruction: EBX = φ2
  55. 63 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ? EBX holds the reference φ2 What is φ2 ? Reconstruction: EBX = φ1
  56. 64 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ? EBX holds the reference φ2 What is φ2 ? Reconstruction: EBX = add(φ0, 2)
  57. 65 • Rebuild the trace with backward analysis mov eax,

    1 add eax, 2 mov ebx, eax Example: // All refs initialized to -1 Register Reference Table { EAX : φ1, EBX : φ2, ECX : -1, ... } // Empty set Symbolic Expression Set { <φ2, φ1>, <φ1, add(φ0, 2)>, <φ0, 1> } What is the semantic trace of EBX ? EBX holds the reference φ2 What is φ2 ? Reconstruction: EBX = add(1, 2)
  58. 66 • Assigning a reference for each register is not

    enough, we must also add references on memory • Follow references over memory mov dword ptr [rbp-0x4], 0x0 ... mov eax, dword ptr [rbp-0x4] push eax ... pop ebx What do we want to know? eax = 0 ebx = eax References φ1 = 0x0 ... φx = φ1 φ2 = φlast_eax_ref ... φx = φ2
  59. 68 • Let's do a DSE on our example •

    This is the CFG of the function check_password • RDI holds the first argument of this function. So, RDI points to a tainted area – We will follow and build our path constraints only on the taint propagation • Let's zoom only on the body loop
  60. 69 • Let's do a DSE on our example •

    DSE path formula construction φ1 = offset Symbolic Expression Set Empty set
  61. 70 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1)
  62. 71 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr
  63. 72 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2)
  64. 73 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X)
  65. 74 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1)
  66. 75 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55)
  67. 76 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7
  68. 77 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr
  69. 78 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset
  70. 79 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9)
  71. 80 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant
  72. 81 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12
  73. 82 • Let's do a DSE on our example •

    DSE path formula construction Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13)
  74. 83 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13)
  75. 84 • Let's do a DSE on our example Symbolic

    Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) X is controllable • OK. Now, what the user can control?
  76. 85 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Spread
  77. 86 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Spread
  78. 87 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Spread
  79. 88 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Controllable
  80. 89 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Controllable Formula reconstruction: cmp(φ8, φ13)
  81. 90 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Controllable Formula reconstruction: cmp(φ7, φ13)
  82. 91 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Controllable Formula reconstruction: cmp(xor(φ6, 0x55), φ13)
  83. 92 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Controllable Formula reconstruction: cmp(xor(sub(φ5, 1), 0x55), φ13)
  84. 93 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Controllable Formula reconstruction: cmp(xor(sub(ZeroExt(X), 1), 0x55), φ13)
  85. 94 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = ZeroExt(X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Formula reconstruction: cmp(xor(sub(ZeroExt(X), 1), 0x55), φ12) Reconstruction
  86. 95 • Let's do a DSE on our example •

    OK. Now, what the user can control? Symbolic Expression Set φ1 = offset (constant) φ2 = SignExt(φ1) φ3 = ptr (constant) φ4 = add(φ3, φ2) φ5 = (X) (controlled) φ6 = sub(φ5, 1) φ7 = xor(φ6, 0x55) φ8 = φ7 φ9 = ptr (constant) φ10 = offset φ11 = add(φ10, φ9) φ12 = constant φ13 = φ12 φ14 = cmp(φ8, φ13) Reconstruction Formula reconstruction: cmp(xor(sub(ZeroExt(X), 1), 0x55), constant)
  87. 96 • Formula reconstruction: cmp(xor(sub(ZeroExt(X) 1), 0x55), constant) – The

    constant is known at runtime : 0x31 is the constant for the first iteration • It is time to use Z3 >>> from z3 import * >>> x = BitVec('x', 8) >>> s = Solver() >>> s.add(((ZeroExt(32, x) - 1) ^ 0x55) == 0x31) >>> s.check() Sat >>> s.model() [x = 101] >>> chr(101) 'e' • To take the true branch the first character of the password must be 'e'. Formula reconstruction
  88. 97 • At this point we got the choice to

    take the true or the false branch by inverting the formula False = ((x – 1) ⊻ 0x55) != 0x31 True = ((x – 1) ⊻ 0x55) == 0x31 • In our case we must take the true branch to go through the second loop iteration – Then, we repeat the same operation until the loop is over What path to chose ?
  89. 98 We repeat this operation until all the paths are

    covered (((x1 – 1) 0x55) != 0x31) ⊻ 1 1 1 1 1 1
  90. 99 We repeat this operation until all the paths are

    covered (((x1 – 1) 0x55) != 0x31) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) != 0x3e) ⊻ 2 2 1 2 2 2 2 2
  91. 100 We repeat this operation until all the paths are

    covered (((x1 – 1) 0x55) != 0x31) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) != 0x3e) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) != 0x3d) ⊻ 3 3 1 3 3 3 3 2 3
  92. 101 We repeat this operation until all the paths are

    covered (((x1 – 1) 0x55) != 0x31) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) != 0x3e) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) != 0x3d) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) != 0x26)) ⊻ 4 4 1 4 4 4 4 2 3 4
  93. 102 We repeat this operation until all the paths are

    covered (((x1 – 1) 0x55) != 0x31) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) != 0x3e) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) != 0x3d) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) != 0x26)) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) != 0x31) ⊻ 5 5 1 5 5 5 5 2 3 4 5
  94. 103 We repeat this operation until all the paths are

    covered (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) == 0x31) ⊻ (((x1 – 1) 0x55) != 0x31) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) != 0x3e) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) != 0x3d) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) != 0x26)) ⊻ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) != 0x31) ⊻ 6 6 1 6 6 6 6 2 3 4 5 6
  95. 104 • The complete formula to return 0 is: –

    βi = ((((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) ⊻ 0x55) == 0x3d) ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) == 0x31)) ⊻ • Where x1, x2, x3, x4 and x5 are five variables controlled by the user inputs • The complete formula to return 1 is: – β(i+1) = ((((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) != 0x31) ⊻ ∨ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) ⊻ 0x55) == 0x3d) ∧ (((x4 – 1) 0x55) != 0x26)) ⊻ ((( ∨ x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ((( ⊻ ∧ x3 – 1) 0x55) != 0x3d) ((( ⊻ ∨ x1 – 1) 0x55) ⊻ == 0x31) ((( ∧ x2 – 1) 0x55) != 0x3e) ((( ⊻ ∨ x1 – 1) 0x55) != 0x31)) ⊻ • Where x1, x2, x3, x4 and x5 are five variables controlled by the user inputs Formula to return 0 or 1 'Dafuq Slide ?!
  96. 105 • The complete formula to return 0 is: –

    βi = ((((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) ⊻ 0x55) == 0x3d) ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) == 0x31)) ⊻ • The concrete value generation using z3 >>> from z3 import * >>> x1, x2, x3, x4, x5 = BitVecs('x1 x2 x3 x4 x5', 8) >>> s = Solver() >>> s.add(And((((x1 - 1) ^ 0x55) == 0x31), (((x2 - 1) ^ 0x55) == 0x3e), (((x3 - 1) ^ 0x55) == 0x3d), (((x4 - 1) ^ 0x55) == 0x26), (((x5 - 1) ^ 0x55) == 0x31))) >>> s.check() sat >>> s.model() [x3 = 105, x2 = 108, x1 = 101, x4 = 116, x5 = 101] >>> print chr(101), chr(108), chr(105), chr(116), chr(101) e l i t e >>> Generate a concrete Value to return 0 'Dafuq Slide ?!
  97. 106 • The complete formula to return 1 is: –

    β(i+1) = ((((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) 0x55) == 0x3d) ⊻ ∧ (((x4 – 1) 0x55) == 0x26) ⊻ ∧ (((x5 – 1) 0x55) != 0x31) ⊻ ∨ (((x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ⊻ ∧ (((x3 – 1) ⊻ 0x55) == 0x3d) ∧ (((x4 – 1) 0x55) != 0x26)) ⊻ ((( ∨ x1 – 1) 0x55) == 0x31) ⊻ ∧ (((x2 – 1) 0x55) == 0x3e) ((( ⊻ ∧ x3 – 1) 0x55) != 0x3d) ((( ⊻ ∨ x1 – 1) 0x55) ⊻ == 0x31) ((( ∧ x2 – 1) 0x55) != 0x3e) ((( ⊻ ∨ x1 – 1) 0x55) != 0x31)) ⊻ • The concrete value generation using z3 >>> s.add(Or(And((((x1 - 1) ^ 0x55) == 0x31), (((x2 - 1) ^ 0x55) == 0x3e), (((x3 - 1) ^ 0x55) == 0x3d), (((x4 - 1) ^ 0x55) == 0x26), (((x5 - 1) ^ 0x55) != 0x31)), And((((x1 - 1) ^ 0x55) == 0x31),(((x2 - 1) ^ 0x55) == 0x3e),(((x3 - 1) ^ 0x55) == 0x3d),(((x4 - 1) ^ 0x55) != 0x26)), And((((x1 - 1) ^ 0x55) == 0x31),(((x2 - 1) ^ 0x55) == 0x3e),(((x3 - 1) ^ 0x55) != 0x3d)), And((((x1 - 1) ^ 0x55) == 0x31), (((x2 - 1) ^ 0x55) != 0x3e)),(((x1 - 1) ^ 0x55) != 0x31))) >>> s.check() sat >>> s.model() [x3 = 128, x2 = 128, x1 = 8, x5 = 128, x4 = 128] Generate a concrete Value to return 1 'Dafuq Slide ?!
  98. 107 • P represents the set of all the possible

    paths • β represents a symbolic path expression • To cover the function check_password we must generate a concrete value for each β in the set P. Formula to cover the function check_password P = {βi , βi+1 , βi+k } ∀ β P : ∈ E(G(β)) Where E is the execution and G the generation of a concrete value from the symbolic expression β.
  99. 110 Ignored values Ignored values • Is covering all the

    paths enough to find vulnerabilities? Values set Execution (time) Set of values ignored in the path Set of possible values used in the path State of variables during the execution Vulnerable zone • No! A variable can hold several possible values during the execution and some of these may not trigger any bugs. • We must generate all concrete values that a path can hold to cover all the possible states. – Imply a lot of overload in the worst case • Below, a Cousot style graph which represents some possible states of a variable during the execution in a path.
  100. 111 Ignored values Ignored values • A bug may not

    make the program crash Values set Execution (time) Set of values ignored in the path Set of possible values used in the path State of the variable during the execution Vulnerable zone • Another important point is that a bug may not make the program crash – We must implement specific analysis to find specific bugs • More detail about these kinds of analysis at my next talk at St'Hack 2015 May not cause a crash
  101. 113 • Recap: – It is possible to cover a

    targeted function in memory using a DSE approach and memory snapshots. • It is also possible to cover all the states of the function but it implies a lot of overload in the worst case • Future work: – Improve the Pin IR – Add runtime analysis to find bugs without crashes • I will talk about that at the St'Hack 2015 event – Simplify an obfuscated trace Conclusion
  102. 114 • Contact – Mail: [email protected] – Twitter: @JonathanSalwan •

    Thanks – I would like to thank the security day staff for their invitation and specially Jérémy Fetiveau for the hard work! – Then, a big thanks to Ninon Eyrolles, Axel Souchet, Serge Guelton, Jean-Baptiste Bédrune and Aurélien Wailly for their feedbacks. • Social event – Don't forget the doar-e social event after the talks, there are some free beers! Thanks for your attention