upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements “ (C99 standard)
// run until overflow while (a < a + 1) { a++; } } What’s the compilation output of Clang/GCC? 1. The function works as expected by the programmer 2. The function body is optimized away 3. The function results in an endless loop 4. It depends on the optimization level
Hello world! Static compilers: optimize code based on Undefined Behavior Bug-finding tools: find bugs assuming that violations are visible side effects (Wang et al. 2012, D'Silva 2015)
= new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code The semantics of an out-of- bounds access are well specified
= new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified
= new long[3]; arr[4] = … long *arr = malloc(3 * sizeof(long)); arr[4] = … Map to Java Code ArrayIndexOutOfBoundsException The semantics of an out-of- bounds access are well specified Automatic bounds checks that cannot be optimized away
Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) Targeting LLVM IR allows executing multiple unsafe languages [Languages other than C?]
Clang C C++ GCC Fortran Other LLVM frontend ... (Lattner et al. 2004) Targeting LLVM IR allows executing multiple unsafe languages [Languages other than C?]
IR Graal JVM [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
IR Graal JVM [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
IR Graal JVM (Würthinger et al. 2012, 2017) [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
IR Graal JVM (Würthinger et al. 2012, 2017) Using Truffle and Graal, we can minimize the instrumentation overhead [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
IR Graal JVM (Würthinger et al. 2012, 2017) [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] [Completenesss vs. Soundness] [Languages other than C?]
IR Graal JVM (Würthinger et al. 2012, 2017) [How does the compilation work?] [Array bounds check elimination] [Optimizations Overview] Safe Sulong can rely on the underlying JVM • Automatic checks • Safe optimizations • Abstraction from the underlying machine and OS [Completenesss vs. Soundness] [Languages other than C?]
Prevent Out-Of-Bounds Accesses 37 long *arr = malloc(3 * sizeof(long)); [How do we know the type?] [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Array bounds check elimination] [Strict-aliasing rule]
free(arr); {0, 0, 0} Address offset = 0 data I64Array contents [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Strict-aliasing rule]
free(arr); arr[0] = … Address offset = 0 data I64Array contents=null [What other errors can Safe Sulong detect?] [Pointer to an integer?] [Strict-aliasing rule]
class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations
class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Implementation of Operations
class LLVMI32LiteralNode extends LLVMExpressionNode { final int literal; public LLVMI32LiteralNode(int literal) { this.literal = literal; } @Override public int executeI32(VirtualFrame frame) { return literal; } } Executable AST node Nodes return their result in an execute() method Implementation of Operations (Würthinger et al. 2012)
{ @Specialization protected int executeI32(int left, int right) { return left + right; } } Executable AST node write %2 add read %i.0 1 A DSL allows a declarative style of specifying and executing nodes Implementation of Operations (Humer et al. 2015)
final FrameSlot slot; public LLVMWriteI32Node(FrameSlot slot) { this.slot = slot; } @Specialization public void writeI32(VirtualFrame frame, int value) { frame.setInt(slot, value); } } Executable AST node write %2 add read %i.0 1 Local variables are represented by an array-like VirtualFrame object Implementation of Operations
the errors • 8 errors not found by LLVM’s AddressSanitizer (and Valgrind) • Compiler optimizations (ASan –O3) prevented the detection of 4 additional bugs 59 [What are the other errors?] [Completenesss vs. Soundness] [Comparison tools]
argv) { printf("%d %s\n", argc, argv[5]); } Out-of-bounds accesses to argv are not instrumented by ASan [What are the other errors?] [Comparison tools]
by LLVM’s AddressSanitizer and Valgrind 62 int main(int argc, char** argv) { printf("%d %s\n", argc, argv[5]); } In Safe Sulong instrumentation cannot be omitted by design [What are the other errors?] [Completenesss vs. Soundness] [Comparison tools]
65 Instrumentation- based bug-finding tools Safe languages Safe Sulong improves upon aspects of existing bug-finding tools • Safe optimizations • Abstraction from the native execution model
shouldn’t 12% No, but there might well be 29% No, that would be crazy 16% Don’t know 8% [Do you know code that uses] relational comparison (with <, >, <=, or >=) of two pointers to separately allocated objects (of compatible object types)? (Memarian et al. 2016) Code Relies on Undefined Behavior
Integer Representation: Lenient C 76 Breaks antisymmetry as different objects might have the same hash code integer_rep(a) = (long) System.identityHashCode(a.pointee) << 32 | offset;
Mitigate Use-after-Free Errors contents[0] = … 79 long *arr = malloc(3 * sizeof(long)); free(arr); arr[0] = … The GC will collect the object when it is no longer referenced
; int *ptr = &(arr[4]); printf ("%ld\n", size_right(ptr)); // prints 24 _size_right() sizeof(int) * 10 We also designed introspection functions for other meta data
= 0; while (*str != '\0') { len++; str++; } return len; } P r o g r a m m i n g ... ... ==16497==ERROR: AddressSanitizer: stack-buffer- overflow on address 0x7ffc59c0ef63 READ of size 1 at 0x7ffc59c0ef63 thread T0 #0 0x4e7442 in strlen /home/manuel/test.c:10:12 #1 0x4e7392 in main /home/manuel/test.c:5:5
( size_right(str) > 0 && *str != '\0') { len++; str++; } return len; } Example: strlen() 90 11 P r o g r a m m i n g ... ... We enhanced a libc to deal with unterminated strings
Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt https://github.com/graalvm/sulong/graphs/contributors Swapnil Gaikwad
Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt Swapnil Gaikwad EuroLLVM 2019 Talk LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong https://github.com/graalvm/sulong/graphs/contributors
Josef Eisl Christian Häubl Matthias Grimmer Thomas Pointhuber Daniel Pekarek Chris Seaton Lukas Stadler Florian Angerer David Gnedt Swapnil Gaikwad EuroLLVM 2019 Talk Sulong: An experience report of using the "other end" of LLVM in GraalVM. https://github.com/graalvm/sulong/graphs/contributors