May 2019, HITB2019AMS
https://conference.hitb.org/hitbsecconf2019ams/materials/D1T2%20-%20fn_fuzzy%20-%20Fast%20Multiple%20Binary%20Diffing%20Triage%20-%20Takahiro%20Haruyama.pdf
https://www.youtube.com/watch?v=kkvNebE9amY
https://github.com/TakahiroHaruyama/ida_haru/tree/master/fn_fuzzy
IDA Pro is the de facto disassembler for malware reverse engineers. The program saves their findings, like function names, into a corresponding database file (IDB). When analyzing new malware variants, the findings can be imported by comparing previously analyzed IDBs allowing analysts to focus on new functions.
However with multiple IDBs, the task of importing the databases is not straightforward or easy. Experienced reverse engineers have hundreds if not thousands of IDBs and typically don't remember the code that they analyzed a few years ago. It is because of this that a tool to identify the most similar and analyzed IDBs quickly is needed.
The new tool called fn_fuzzy calculates two kinds of fuzzy hashes for each function of IDBs.
1. ssdeep hash value of code bytes
Relocation (fixup) bytes, direct memory reference data and other ignorable ones are excluded in the calculation.
2. Machoc hash value of call flow graph
Machoc value is used to correct the result by ssdeep hash when the function code bytes are small or generated polymorphically.
All hashes are saved into one database file then used for comparison. We can import function names and prototypes from all IDBs to the target at one time.
I will explain how to implement and use fn_fuzzy while demonstrating it. I also show the performance and accuracy of comparisons with BinDiff and its above-mentioned automation tool. It's difficult to cover all similarities found by BinDiff but fn_fuzzy can find ones that BinDiff misses and the comparison speed is much faster (about 700 IDBs comparison took just 20-180 secs in the VM). I believe fn_fuzzy is useful for fast multiple binary diffing triage.