Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical msticpy use ~ rainbow bridge to SIEM ...

hackeT
October 10, 2023

Practical msticpy use ~ rainbow bridge to SIEM for advanced threat hunting ~

English version

hackeT

October 10, 2023
Tweet

More Decks by hackeT

Other Decks in Programming

Transcript

  1. 1

  2. $WHOAMI • Threat Hunter/App Developer/Threat Researcher • OSS Contributor •

    msticpy,unprotect,atomic-red-team,cuckoo,capev2.. • Qualifications • 7 GIACs • CISSP、CISA • SNS • HN: hackeT • X: @T_8ase 2 Threat Researcher/binarian Incident Handler Forensic Service Dev/Opera:on SOC Analyst MSSP CSIRT AI Anti-Virus Full-stack Engineer Fighting injustice attack world !
  3. $more GoAhead Inc. KOBANZAME (IP Whois DB) Heuristic Logic Data

    Visualization Aim for maximum effectiveness with minimum resources CEO: Mitsuhiro Nakamura Splunk.conf 2017@USA Splunk Champion Established in 2017 Data Analysis Company Splunk is our strength for Security Challenges Free Splunk App/Add-ons by GoAhead https://splunkbase.splunk.com/apps?keyword=goahead 3
  4. Agenda • Invariable Operation with SIEM • msticpy 101 Overview

    and Basics • msticpy 201 Jupyter Notebook and ( pros | cons ) • msticpy 301 Practical use case • Take Away 4
  5. Background and Issues 6 Analysis︓Human-wave tactics for raw log Monitoring︓Alert

    by Email Analysis︓ Multi-axis search of formatted logs Monitoring︓Visualized Dashboard, Alert from SIEM Nowadays Never ending Dev & Ope tasks l Modification and addition of analytical logic to keep up with new threats l Thresholds tailored to the internal situation as well as the threat situation in the world Thresh olds Bugs Add Panels Documen tations Modify Update SIEM func:ons and exis:ng dashboards Biases some:mes lead to non-free analysis Old fashioned
  6. Objective Threat Hunting • Proactive detection and response to signs

    of malicious activity or threats • Investigate using threat intelligence, unapplied IOCs, anomaly detection • Iterations between hypothesis and verification Advanced Threat Hunting • Identifying undetected threats from raw data • check raw data too and look for omissions in processing and detection by security product. • Inherently data analysis with freedom (ad hoc) • uniquely conceived analytical logic • unrestricted external collaboration, • eccentric visualization • emphasis that is easy for readers to understand • Continuous update operation • Machine Learning & Deep Learning (ML/DL) • Automation 7 Advanced Threat Hunting
  7. Security Information and Event Management • SIEM Products • Splunk/MS

    Sentinel/IBM Qradar/ Exabeam/Sumo Logic/Elastic, etc. • SIEM by Security venders • Can collect/extract/search/analyze/ visualize/detect/respond • Have the individual threat hunting function • Have ML/DL extensions 8 First Genera+on Second Genera+on Third Generation Gartner 2005 Log and Event management integration Correlation analysis with CTI Big data processing Gartner 2017 UEBA, SOAR addition source: Gartner Inc, 2022 Magic Quadrant
  8. SIEM’s advantage • Rapid search by indexing and field normalization

    (CIM, ASIM) • Statistical calculations are easy with the benefit of its search language • Can store threat intelligence • Multiple analyst can see the same data and analysis results • SIEM vendors also provide a lot of detection logic 9
  9. SIEM's breakdown • Rapid search by indexing and field normalization

    (CIM, ASIM) • If extraction fails, it is missing from the search at the beginning or from the analysis along the way. • Statistical calculations are easy with the benefit of its search language • Existing some process which is not good at, and take costs for learning search language • Can store threat intelligence • Most of the intelligence is self-prepared and operational by ourselves. • Multiple analyst can see the same data and analysis results • Various limitations due to shared resources • SIEM vendors also provide a lot of detection logic • Necessary and sufficient ? No! 10
  10. Not recommend to rely too much on SIEM analysis! •

    When a failure occurs, not everyone can be analyzed until recovery. • Over-reliance on analysis in SIEM search language only, forgetting how to analyze raw data • Who will ensure the integrity of the data and search results in SIEM ? • Limitations of SIEM • Default upper limits for sub search and multi value (truncate) • Default upper limit for number of plots on graph (truncate) • Difficult to notice search omissions due to misconfiguration • Don't rely solely on the logic provided by SIEM vender • Enterprise SIEMs Miss 76 Percent of MITRE ATT&CK Techniques • source: CardinalOps, ”2023 Report on State of SIEM Detection Risk” 11
  11. 12 For Advanced Threat Hunting SIEM Time Series Analysis Automation

    Consistent I/O Data Validation Machine Learning Infinite Visualiza:on msticpy
  12. Microsoft Threat Intelligence Center (MSTIC) on Python and Jupyter Notebooks

    • MSTICpy: OSS library developed by Microsoft's MSTIC • Written in Python, usually used on Jupyter Notebooks • Extensive functionality for infringement investigation and threat hunting • March 2019 ~ 200k+ Downloads https://github.com/microsoft/msticpy • Presented at BlackHat USA 2020 • Frequent update recently and continues to evolve • Still few users and blog article in Asia and Japan • Fall into the following four process broadly • Only desired functions can be used piecemeal because of library-based 14 msticpy Data Acquisition Data Processing Analysis including ML Visualization
  13. msticpy’s Documentation & Resource • MSTICpy ☞ msticpy in this

    presentation • Official document • https://msticpy.readthedocs.io • Word count 100k+ • RST files 80+ • Jupyter Notebook samples 40+ • Past training resources • msticpy-lab, msticpy-training github repo • Official Blog • https://msticpy.medium.com 15 Time-consuming for learning with the huge resources ...
  14. msticpy Capabilities 16 Querying Logs Data Visualization Utility Pivot Data

    Enrichment Security Analysis ms@cpyconfig.yaml Acquisition Visualization Enrichment Analysis Analysis Analysis h"ps://twi"er.com/fr0gger_/status/1623209441146593281?s=61&t=v8tLnMcFFdnsiT38CeGBcg
  15. 17 msticpy Data Flow Diagram SIEM DataLake (SIEM) raw Jupyter

    Notebook Internet Acquisi:on Enrichment Analysis Visualization rich p Threat Intel Lookup p Whois, GeoIP p Decode p Extract p ML Local Local upload
  16. msticpy: Data Acquisition (1) • Create instance of Query Provider

    • Select from data sources (left picture) 18 LocalData: connect to .pkl files in ./data dir Splunk: connect to Splunk REST port with msticpyconfig.yaml Communication channel is NOT independently encrypted by msticpy’s uniq func => HTTPS (SSL) is necessary
  17. • Return: Pandas DataFrame • Ad hoc query function •

    exec_query(): arbitrary query • Built-in query function • select from the list varies by data source 19 msticpy: Data Acquisition (2)
  18. msticpy: Enrichment • Threat Intel Lookup • Pivot TI function

    (Only on Jupyter Notebook) • TILookup class (Available on also python program) • GeoIP (MaxMind GeoLite2, IPStack) • IPWhois (Cymru, RADB, RDAP) 20
  19. msticpy: Analysis (Pivot) • Pivot Functions being loaded by "init_notebook()"

    is required basically • Wrap msticpy functions and classes for ease of discovery and use • Standardization of function parameters, syntax, and output format • “.mp_pivot.” can be piped in multiple stages 22
  20. msticpy: Analysis (Security) • Event Clustering • Classification of “process

    and logon events” on the host machine • Time Series Analysis • Anomaly detection in time series data considering seasonal variations • Outlier Identification • Outlier detection using decision trees • Anomalous Session • Unusual pattern detection of rare event sequences with low likelihood • Use of the event’s command name, its parameter names and values 23
  21. msticpy: Visualization • Implemented with BokehJS • Viz charts implemented

    in msticpy • Timeline,ProcessTree,Folium Map,Matrix Plot, Entity/Network Graph ,etc. • Can create additional charts with MorphCharts 24
  22. Benefits of Analyzing with Jupyter Notebook • Reproducibility of data,

    it can output of intermediate results • Easy combination/integration with external sources • Easy use of ML/DL frameworks • Extensive visualization library at your disposal • Gain applied skills as a data scientist 26
  23. Ideal Relationship between Jupyter Notebook and SIEM 27 msticpy SIEM

    Advanced Threat Hunting Intelligence Knowledge Deep Analysis on denoised data Rough noise reduction
  24. msticpy’s pros: Seasonal-Trend decomposition using LOESS 28 Book: Covered in

    also “Machine Learning for Security Engineers Chapter 6 Anomaly Detection”
  25. msticpy’s pros: Consistent I/O • Sending by Data Uploader function

    (Transfer) • Only Azure Sentinel and Splunk are supported as of Aug 2023 • Can upload Data Frame, File, Folder 29 OSINT (Internet) SIEM msticpy Enriching SIEM ! Visualization charts cannot be transferred. However, similar Viz can be drawn in SIEM from the transferred results.
  26. Jupyter & msticpy’s pros: Data Validation • Check the DataFrame

    result sequentially • Save for accidental overwriting by copy() func • Value type conversion and strip null values • Easy to validate char codes • GUI for time ranges ☞ • Pre-confirming actual Queries via Query Provider by “print” option 30 Query to be searched
  27. Jupyter’s pros: Use of much ML/DL • Only a few

    ML models have built-in msticpy • Event Clustering ☞ DBSCAN in scikit-learn • Time Series Analysis and Anomaries ☞ STL in statsmodels • Outlier Identification ☞ IsolationForest in scikit-learn • less parameter tuning is required since they are specialized for commonly used threat hunting applications • Flexibility to use Python's rich ML/DL library 31 NLP ML DL
  28. Jupyter’s pros: Infinite Visualization 32 Splunk MS Sentinel Jupyter 10,000

    10,000 ♾ (Infinity) Maximum number of data plots (by default) This Data was truncated in Splunk !
  29. [FYI] Change the upper limit in the dashboard options •

    We can change the limit with the dashboard option "charting.data.count” in Splunk, but... 33
  30. • Python library • Batch execution of Notebook files with

    different parameters • Introduced in the "Put it into Operation" section at the end of msticpy's training materials 34 Parameters are overwritten in the output notebook☟ CUI Python Jupyter’s pros: Automation with papermill
  31. • Possibility to transfer sensitive data in SIEM to external

    Jupyter • Handling it with SIEM’s ACL may be the only way. • Eavesdropping/MITM Attack during data transfer to the Jupyter • SSL security dependencies on the SIEM side • More complicated security design • Transferring Threat Intelligence data to SIEM is relatively clear. 35 SIEM msticpy (Jupyter) ʂ Jupyter’s cons: Security Concerns about Data Transfer
  32. Toward Practical msticpy Use • Push direction is fine •

    Intelligence collected from external sources, analyzed and processed, and transferred to SIEM • Pull direction has the security concern of data transferring. • Planning a new security design from scratch for msticpy alone is a hurdle. • SIEM vender’s advanced analytical tricks with Jupyter • MS Sentinel ☞「Microsoft Azure Machine Learning Workspace」 • Completed within Azure • Splunk ☟ 「Splunk App for Data Science and Deep Learning (DSDL)」 • Preparing machine resources such as Docker containers externally • Data exchange between containers and Splunk • Installing msticpy in container side 37 msticpy Splunk DSDL + Store the credential strings in “Azure Key Vault” and load them from there
  33. $more Splunk App for DSDL • single-instance | side-by-side •

    Implemented data security features • Use of proprietary SSL certificates • Custom password settings for Jupyter • Fine-grained ACL design with Splunk access tokens • Splunk MLTK commands can interact with containers • | fit ( Training to create a model ) • | apply ( Apply the trained model to the data for identification ) 38 ʂ ʂ
  34. Use Case: Powershell process command line(1) 39 Search in Splunk

    powershell -enc Decode base64 Delete null byte (¥x00) Extract IoC Enrichment IoC Return to Splunk | fit | apply Required the first time for model creation Originally, this mechanism is prepared for ML/DL algorithms, so I developed a custom model incorpora@ng ms@cpy. h]ps://github.com/Tatsuya-hasegawa/MSTICPy_u:ls/blob/main/splunk_dsdl/ms:cpy_powershell_ioc.ipynb By executing the fit command, one .py file is created in app/model directory, the file is consisting of export functions from .ipynb
  35. Take Away • Not recommend to rely too much on

    SIEM analysis! • msticpy's missionary work: happy to see more APAC users • Let’s analyze and code on Jupyter Notebook to hone your skills! • Let’s get on existing mechanisms for data security concerns! • Let’s become a contributor of your favorite OSS. Happy msticpying! 41
  36. Quotations & References • msticpy docs https://msticpy.readthedocs.io/en/latest/ • msticpy-training https://github.com/microsoft/msticpy-training

    • msticpy-lab https://github.com/microsoft/msticpy-lab • Splunk DSDL docs https://docs.splunk.com/Documentation/DSDL/5.1.0/User/IntroDSDL • Splunk botsv2 dataset https://github.com/splunk/botsv2 • Microsoft Sentinel Notebook and msticpy https://learn.microsoft.com/en-us/azure/sentinel/notebook-get-started • papermill docs https://papermill.readthedocs.io/en/latest/ • macnica SIEM introduction by exabeam https://www.macnica.co.jp/business/security/manufacturers/exabeam/feature_07.html • My Qiita blog about msticpy https://qiita.com/hackeT • Machine Learning for Security Engineers https://www.oreilly.co.jp/books/9784873119076/ • awesome detection engineering https://github.com/infosecB/awesome-detection-engineering • CardinalOps’s 2023 report https://cardinalops.com/whitepapers/2023-report-on-state-of-siem-detection-risk/ 42