• Bug detection and other forms of static analysis is pattern matching on increasingly precise semantics • Most static bug detectors find a subset of bugs (Habib and Pradel, ASE 2018) • Humans need to identify the patterns • As the semantics relax, static analysis becomes unsound • Almost impossible for dynamic languages (“stringly typed”)
to exploit the natural language information channel to help with tasks such as: • Bug finding • Type annotations • Inconsistencies • Source code summarisation • …
software corpora have similar statistical properties to natural language corpora; and these properties can be exploited to build better software engineering tools.” Hindle et al. On the naturalness of software. ICSE 2012
Name-based Bug Detection. OOPSLA 2018 How to produce buggy code? • Swap function arguments foo(a, b) -> foo(b, a) • Replace binary operators i <= length -> i % length • Replace binary operand i <= length -> i <= foo
Inconsistent Method Names. ICSE 2019 1. Build embeddings of function names and body vectors 2. For each function body: 1. Find functions close to it in vector space 2. Check their respective name distance
vocabularies are 10x the size of NL ones • Compression techniques (e.g. BPE) to the rescue • How to feed code to a network without loosing info from either the NL or the semantics channel? • Code2Vec, TreeLSTMs, GGNNs,… • Keeping up with evolution • Making tools — not just research papers