Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Art of Malware C2 Scanning - How to Reverse and Emulate Protocol Obfuscated by Compiler

The Art of Malware C2 Scanning - How to Reverse and Emulate Protocol Obfuscated by Compiler

2024, REcon
https://cfp.recon.cx/recon2024/talk/GRV7EX/
https://gitlab.com/eshard/d810/-/merge_requests/3
https://github.com/TakahiroHaruyama/ida_haru/tree/master/callstrings

Internet-wide malware command-and-control (C2) server scanning based on protocol emulation is a game changing technique as one of the most proactive threat detection approaches. It allows real time blocking of malicious communications of a variety of known malware families. On the other hand, protocol reversing is a challenging task, especially when the code is obfuscated at compiler-level.

In this presentation, I will detail how to reverse the C2 protocol of the malware used by one of the PRC-linked cyberespionage threat actors. The malware was obfuscated with multiple methods likely applied at compile time. In order to identify the protocol format and its encryption algorithm, I not only extended an existing tool to defeat more control flow flattening (CFF) and mixed boolean arithmetic (MBA) expression cases but also implemented another one to decode strings constructed polymorphically in stack area under the CFF conditions.

I will also explain how to emulate the C2 protocol. I validated the request/response data by implementing a fake C2 server and catching a real one. Then I developed a PoC scanner to narrow down true positives based on multiple clues such as TLS handshake errors, JARM fingerprints and HTTP header values authenticated by C2. I will demonstrate the scanner in the presentation.

The presented research techniques and findings will be beneficial to those who need deep malware RE.

Takahiro Haruyama

July 01, 2024
Tweet

More Decks by Takahiro Haruyama

Other Decks in Technology

Transcript

  1. THE ART OF MALWARE C2 SCANNING - HOW TO REVERSE

    AND EMULATE PROTOCOL OBFUSCATED BY COMPILER TAKAHIRO HARUYAMA BINARLY 1
  2. WHO AM I? • Takahiro Haruyama (@cci_forensics) • Principal Security

    Researcher at Binarly • Previously Staff Threat Researcher at Carbon Black TAU • Past Research • Scalable RE automation (e.g., hunting vulnerable drivers) • Anti-Forensics (e.g., firmware acquisition MitM attack) • Malware Analysis (e.g., Internet-wide C2 scanning) 2
  3. WHY MALWARE C2 SCANNING? 5 • IP reputation is not

    effective for catching fresh C2s • Internet-wide C2 scanning is beneficial from both detection and threat intel perspectives
  4. HOW MALWARE C2 SCANNING? Protocol reversing • Identify • Data

    format • Encoding/encryption algorithm Protocol emulation • Develop PoC scanner • Validate request/response with fake/real C2 6
  5. CASE: PLUGX • Long used, but still many variants in

    the wild • Most variants has almost the same C2 protocol except the packet encoding algorithm • The “Hodur” variants (aka MiniPlug) were obfuscated with multiple methods likely applied at compile time • EclecticIQ and Check Point reported the latest variants last year, but no one had described the updated C2 protocol details • I focus on the Hodur de-obfuscations, then explain the protocol reversing and emulation briefly 7
  6. WHAT’S CONTROL FLOW FLATTENING? • Control flow flattening (CFF) transforms

    a program's control flow to make it much harder to understand, while preserving the original functionality 10 http://tigress.cs.arizona.edu/transformPage/docs/flatten/index.html First Block(s) Control Flow Dispatcher(s) Flattened Blocks
  7. HOW CFF WORKS • Control flow dispatchers decide which block

    to execute next based on a state variable • The state variable is updated in first/flattened blocks 11
  8. CONTROL FLOW UNFLATTENING: BASIC STRATEGY 1. Identify control flow dispatchers

    and state variables 2. Trace back the state variable values from the end of flattened blocks 3. Associate the values with the block IDs 4. Re-order the code flow based on the associations • I Use IDA Pro microcode for the unflattening task • Intermediate representation used by Hex-Rays decompiler • We can implement the algorithm in the optblock_t callback 12
  9. CONTROL FLOW UNFLATTENING: BASIC STRATEGY 1. Identify control flow dispatchers

    and state variables 2. Track back the state variable values from the end of flattened blocks 3. Associate the values with the block IDs 4. Re-order the code flow based on the associations • I Use IDA Pro microcode for the unflattening task • Intermediate representation used by Hex-Rays decompiler • We can implement the algorithm in the optblock_t callback 13
  10. CONTROL FLOW UNFLATTENING: IDA MICROCODE TOOL HISTORY • HexRaysDeob (2018)

    • The first implementation breaking CFF • Ported to IDAPython by Hex-Rays (2019) • Tested on only one binary, so some versions implemented • APT10 ANEL (2019), Emotet (2022) • D-810 (2020) • Effective for not only OLLVM but also Tigress Flatten • Works reliably with different binaries 14
  11. D-810 ISSUES • D-810 worked for the most functions of

    the Hodur samples, but some key functions related to the C2 protocol were still flattened • Additional CFF settings? • Two issues 1. The control flow dispatcher detections failed 2. The block state variable tracking failed 15
  12. ISSUE1: CONTROL FLOW DISPATCHER DETECTION FAILURE • The dispatcher detection

    algorithm misses dispatchers whose predecessors are conditional jumps by the state variable • The genmc plugin was useful for troubleshooting 16 dispatcher predecessor
  13. ISSUE1: FIX • I added another dispatcher detection algorithm •

    The algorithm simply guesses a dispatcher block based on the biggest number of predecessors • The dispatcher will be validated based on the entropy value of the state variable (only effective for OLLVM) 17
  14. ISSUE1: FIX • I added another dispatcher detection algorithm •

    The algorithm simply guesses a dispatcher block based on the biggest number of predecessors • The dispatcher will be validated based on the entropy value of the state variable (only effective for OLLVM) 18
  15. ISSUE2: BLOCK STATE VARIABLE TRACKING FAILURE • The state variable

    tracking fails if the value is assigned in the first blocks • D-810 only traces in the flattened blocks and doesn’t recognize the dispatcher has been reached -> loop L 19 Tracking fails The value is assigned D810.emulator - WARNING - Can't evaluate instruction: ..Variable '%var_depend_on_a10_1.4{24}' is not defined D810.tracker - DEBUG - Computing: ['ebx.4'] for path [8, 22, 44, 45, 46, 47, 48, 49, 50, 8, 9, 35, 36, 109, 110, 111, 112]
  16. ISSUE2: FIX • The added code detects dispatchers in tracking

    and resumes the tracking from the end of the first blocks • The unflattening performance is also improved 20
  17. ISSUE2: FIX • The added code detects dispatchers in tracking

    and resumes the tracking from the end of the first blocks • The unflattening performance is also improved 21
  18. • Mixed Boolean Arithmetic (MBA) expressions transform a simple expression

    into a complex but semantically equivalent form 23 The same encoded string is decoded in different expressions The same encoded string is decoded in different expressions The same encoded string is decoded in different expressions
  19. SIMPLIFYING MBA EXPRESSIONS 1. Find an obfuscation pattern and hypothesize

    for simplification 2. Validate the hypothesis by equivalence checking • e.g., using Z3 or Arybo 3. Replace the pattern with the simplified one 24 $ iarybo 8 In [1]: ~(x ^ ~y) == x ^ y Out[1]: True $ ipython In [1]: import z3 In [2]: x, y = z3.BitVecs("x y", 8) In [3]: s = z3.SolverFor("QF_BV") In [4]: s.add((~(x ^ ~y)) != (x ^ y)) In [5]: s.check() Out[5]: unsat
  20. SIMPLIFICATION ON IDA + D-810 • D-810 uses a custom

    AstNode class to represent an (abstract) microcode instruction • I could easily define several new replacement patterns • genmc is useful to show microcode instruction structures 25
  21. SIMPLIFICATION ON IDA + D-810 • D-810 uses a custom

    AstNode class to represent an (abstract) microcode instruction • I could easily define several new replacement patterns • genmc is useful to show microcode instruction structures 26
  22. LIMITATION • More functions, more complicated patterns L • It

    was difficult to defeat all MBA expressions perfectly • I only handled interesting patterns, especially related to the string decoding used by the samples 27
  23. STACK STRINGS 29 • All strings are constructed and decoded

    in the stack area • After defeating CFF and MBA expressions, the decoding algorithm was identified • enc[i] ^= (i + Const) ^ Const • The constant value is different per function
  24. COPYING THE ENCODED STRING BYTES INTO STACK • Sometimes the

    Hex-Rays decompiler partially recognizes the copy or only shows the assignments • For static decoding, we need to • Construct the bytes from the assigned variables • Detect the length and constant value used in the decoding algorithm 30 Length and constant value Length and constant value Combination of global variable and hard-coded bytes
  25. VARIOUS ACCESS PATTERNS 31 Referencing another variable (enc is decoded)

    Defeating MBA expressions is not perfect I decided to take an emulation approach Additional XORs before decoding
  26. EMULATION ISSUE IN GENERAL • Unicorn-based flare-emu library provides users

    with a flexible interface for scripting emulation tasks on IDA • The iterateAllPaths API emulates all basic block paths in a function • Looked to be useful to de-obfuscate stack strings (e.g., ironstrings) • This API emulates only once per basic block • I modified the code to reproduce xor loops detected by CAPA 32
  27. EMULATION ISSUE IN THIS SAMPLE • The flare-emu API takes

    only one path in CFF functions • The code simply tracks basic block successors • The search ends when revisiting the CFF dispatchers • Microcode-based solutions • Emulate x86 code in an unflattened microcode block order • Extend D-810 microcode emulation functionality • I tried both a little bit, but I realized that they are not straightforward L 33
  28. SOLUTION • I utilized another flare-emu API (emulateRange) that emulates

    the code as is, without changing the code flow • Some quick hacks added to flare-emu (e.g., LoadLibrary/GetProcAddress hook, infinite loop detection, etc.) • The created script worked for 58% of the tested functions • I also implemented a script based on the IDA debug hook class (DBG_Hooks) to handle the failed functions • Not elegant, but the combination covers most strings quickly 34
  29. SOLUTION (CONT.) • Both scripts recover argument strings on call

    instructions in emulation/debugging • The information such as calling convention and argument type is taken through the Hex-Rays decompiler APIs • The sample dynamically resolves all API addresses except GetProcAddress after decoding the API name strings • When an address assignment is detected, the script applies the API function type to the local variable pointer • GetTypeSignature() written by Rolf Rolles 35
  30. 36 Set type to the local variable by ida_hexrays.modify_user_lvars() Set

    type to the operand of the call instruction by ida_nalt.set_op_tinfo()
  31. SOLUTION (CONT.) • The scripts still don’t cover all strings

    • A semi-automatic script handles minor cases individually • flare-emu emulateSelection + static decoding 37
  32. IDA_CALLSTRINGS SCRIPTS Used Library and API Static decoding Flare-emu iterateAllPaths

    Flare-emu emulateRange Flare-emu emulateSelection IDA DBG_Hooks Automated? Yes Yes Yes No Yes Effective for another malware? No Yes Yes No Yes Effective in CFF funcs? Yes No Yes - Yes API func type set? No Yes Yes No Yes Limitation Strings used by memcpy Modifications needed to flare-emu and CAPA All execution paths not covered Manual selection required Strings used during debugging 38
  33. PROTOCOL OVERVIEW • The latest Hodur samples only support HTTP/HTTPS

    • Two header values (Sec-Dest/Sec-Site) used to authenticate clients • GET request for the initial handshake • A RC4 key returned • Periodical POST requests to receive C2 commands after the handshake • The request/response data are encrypted with the key 40
  34. AUTHENTICATION HEADERS • Sec-Dest: %2.2X%ws (e.g., “7BnqmmCg”) • A random

    byte (0x64-0x99) • 0x64 + 0-0x35 by QueryPerformanceCounter • A random 6 characters • The checksum depends on the method • GET = 99, POST = 88 • Sec-Site: %2.2X%2.2X%ws (e.g., “896B2AC144C9E2E09836”) • Two random bytes (0x64-0x99) • 8-bytes victim ID generated by time-related APIs 41 In [2]: sum(b for b in b'nqmmCg') & 0xff Out[2]: 99
  35. INITIAL HANDSHAKE • GET request with the authentication headers •

    A RC4 key is returned if the header values are valid • If not valid, no content returned • The Hodur sample code checks if the Content-Type is application/octet-stream • The Content-Length was unknown at static analysis but revealed during the scanner development 42
  36. AFTER HANDSHAKE • The sample receives a C2 command by

    POST requests • The POST request and response data are encrypted using RC4 • The POST data header is the same as the PlugX variants, but the head key is not used • The C2 response body also has the same header 43
  37. FAKE C2 SERVER FOR VALIDATION • Developed a fake C2

    server to validate the request data of the PoC scanner and other recent samples • fakenet (IP diverter) + Python HTTPS server 46 [*] Validating Sec-Dest.. [+] Prefix number 0x95 is valid [+] The hash of the random bytes b'xbsYpB' matches 88 [*] Validating Sec-Site.. [+] Prefix numbers 0x7f/0x8e is valid [+] victim_id='F4EB6EF3A8882016’ .. [+] The decrypted POST data is saved as dec_post_data.bin [*] Responding with PlugX custom header data.. (C2 command = 0x7002) POST request validation
  38. HUNTING RECENT SAMPLES • VT-retrohunted using yara_fn 47 { 55

    8B EC 6A ?? 68 ?? ?? ?? ?? 64 A1 ?? ?? ?? ?? 50 81 EC ?? ?? ?? ?? 53 56 57 A1 ?? ?? ?? ?? 33 C5 50 8D 45 ?? 64 A3 ?? ?? ?? ?? 89 65 ?? 8B 45 ?? 50 8D 8D ?? ?? ?? ?? E8 } o_imm fixup o_mem o_displ o_near
  39. HUNTING RECENT SAMPLES (CONT.) • One of the rules hit

    the latest sample in Dec last year • CFF was not applied to the sample • The C2 included in the sample was active J • I could check the Content-Length and the format of the GET response 48
  40. APPROACH BASED ON VALIDATION • All recent samples had exactly

    the same C2 protocol encryption and data format • Every sample’s C2 protocol/port is HTTPS/443 • No need to send the POST request after handshake • The C2 likely responded without content until commands are specified by operators • I started to implement a scanner just checking the difference between GET requests with/without the authentication headers 49
  41. TLS HANDSHAKE ISSUE • OpenSSL caused an internal error during

    the TLS handshake 50 * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS header, Unknown (21): * TLSv1.2 (OUT), TLS alert, internal error (592): * error:0800006A:elliptic curve routines::point at infinity * Closing connection 0 curl: (35) error:0800006A:elliptic curve routines::point at infinity
  42. TLS HANDSHAKE ISSUE (CONT.) • I tested major open source

    TLS clients • Only LibreSSL (pylibtls) worked for the TLS handshake 51 OpenSSL Mbed TLS (python-mbedtls) wolfSSL (wolfssl-py) LibreSSL (pylibtls) Tested version 1.1.1k, 3.0.2, 3.2.0 2.28.6 5.6.0 3.8.2 Worked? No No No Yes
  43. DETECTION BY THIRD PARTY SCANS • Shodan haven't been able

    to recognize the port since at least last Dec • Censys can detect the port but the protocol is UNKNOWN (not HTTPS) 52
  44. INTERNET-WIDE SCANNING WORKFLOW • Automate with Python (Use asynchronous I/O

    for OpenSSL/JARM scans) • Exclude as much as possible before the pylibtls scan ZMap • Get the list of hosts open at TCP/443 OpenSSL • Try TLS handshake • Cause an internal error? JARM • Match the JARM fingerprint value of the Hodur C2? pylibtls • GET request with/without auth headers • Get a RC4 key-like string only when sending with the headers? 53
  45. RESULT • Two C2 servers were found late last December

    • 149[.]104.12.64 and 45[.]83.236.105 • Two months later, Trendmicro referred to the C2s in the blog • But they are still active 54
  46. WRAP-UP • Defeating compiler-level obfuscations is easier than before •

    2-3 months for APT10 ANEL -> 3-4 weeks for Hodur • We still need to improve or create tools when RE requires de-obfuscating code precisely • Code will be available online after the conference • The developed scanner keeps tracking the malware C2s on the Internet • We can respond proactively using the intel 57