Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TPCiC 2021: ElasticSearch Exploration in Your T...

TPCiC 2021: ElasticSearch Exploration in Your Terminal

You've seen the pretty graphs. Visuals are great for signaling there is a problem somewhere in your system. How do you, a command line guru, go from pretty graphs to root cause analysis? Most likely you'll be reaching for paradigms from the command line: composability and a flexible, compact syntax to ask your questions. I'd like to talk more about integrating ElasticSearch-based dashboards back to the command line workflows I love.

This talk is an overview of a tool I developed while working at Booking.com to drastically reduce the time and complexity of performing incident reponse against rich, structured data in ElasticSearch. It was developed with the help of the security and fraud teams to perform ad-hoc queries critical for incident response. The tool served the team well and it's been under active development ever since. It continues to grow in capabilities aimed to make adhoc analysis simple, easy, and accessible to hardened command line jockeys and command line newbies.

Join me to learn how to bring the logging data you love back to your terminal!

Brad Lhotsky

June 09, 2021
Tweet

More Decks by Brad Lhotsky

Other Decks in Technology

Transcript

  1. ELASTICSEARCH DATA EXPLORATION IN YOUR TERMINAL Things you never knew

    you needed until it was too late. With your host, Brad Lhotsky
  2. MY BROWSER IS NOT AN IDE ➤ Which browser? ➤

    Are you using privacy extensions? ➤ What happens when I hit “Backspace” ➤ Don’t get me started on “gestures” ➤ Bloaty and slow ➤ Prone to distraction @reyjrar
  3. THE CLI IS A WORKSPACE ➤ Which shell? ➤ Tab

    Autocomplete ➤ ReadLine ➤ dot fi les ➤ Access to a plethora of interoperable tools ➤ OK, I could MUD from my terminal ➤ Otherwise, fairly purpose built @reyjrar
  4. “ There are a fi nite number of key strokes

    before you die, use them wisely. - A Wise Programmer
  5. UNIX PHILOSOPHY ➤ Do One Thing Well ➤ Assume output

    will be used as input and vice versa ➤ Favor the creation of tools or scripts, even for seemingly one-o ff jobs
  6. PERL ➤ Easy things easy, hard things possible ➤ Sloppy

    and unpredictable uses just like natural languages ➤ Grow with you ➤ DWIM ➤ TIMTOWDI ➤ The CPAN
  7. ES-UTILS ➤ Monitoring ➤ Maintenance ➤ Status and Informational Tools

    ➤ Built as a Reusable Perl functional library ➤ ES Version Agnostic ➤ Assumes index-%Y.%m.%d index names ➤ And then came.., ➤ es-search.pl @reyjrar
  8. OPTIONAL: INSTALL PERLBREW # Install perlbrew curl -L https://install.perlbrew.pl \

    | bash # Setup perlbrew perlbrew install -j8 -n 5.34.0 perlbrew switch 5.34.0 perlbrew install-cpanm @reyjrar
  9. GETTING STARTED: CONNECTING # Defaults es-search.pl --host localhost --port 9200

    # Connect to es-node01 es-search.pl --host es-node01 @reyjrar
  10. GETTING STARTED: CONNECT PREFERENCES cat ~/.es-utils.yml --- host: es-gateway.corp.company.com port:

    443 proto: https http-username: bob password-exec: ~/bin/get-es-password.sh @reyjrar
  11. SOME HELPFUL NOTES ➤ Searches are constrained by the calendar

    date in the index name ➤ Searches use “index base names” via --base logstash ➤ Use --days 7 for opening scope to 7 days ➤ Searches will stop once they receive --size 20 results ➤ Use --all to get all results across full timespan ➤ Sort order is descending, override with --asc ➤ Target a speci fi c index or alias with --index logstash-2019.10.21 @reyjrar
  12. GETTING STARTED: INDEX SELECTION # List index basenames $ es-search.pl

    --bases Bases available for search: access security syslog # Bases: 3 from a combined 61 indices. @reyjrar
  13. GETTING STARTED: SET A DEFAULT "BASE" cat ~/.es-utils.yml --- host:

    es-gateway.corp.company.com base: log days: 1 @reyjrar
  14. GETTING STARTED: TIMESTAMP DETECTION # Specify timestamp es-search.pl --base log

    \ --timestamp timestamp cat ~/.es-utils.yml --- base: log timestamp: timestamp @reyjrar
  15. GETTING STARTED: TIMESTAMP PREFERENCES cat ~/.es-utils.yml --- base: log #

    Global default timestamp field timestamp: timestamp # Per base settings meta: logstash: timestamp: '@timestamp' @reyjrar
  16. GETTING STARTED: SHOW ME MONEY DATA MORE LIKE LOGS #

    Show just selected fields es-search.pl --show hostname,program,message @reyjrar
  17. GETTING STARTED: SHOW ME MATCHING DOCS # Show sshd logs

    es-search.pl program:sshd \ --show hostname,program,message @reyjrar
  18. GETTING STARTED: MULTIPLE SEARCH PARAMETERS # Multiple parameters, AND'd es-search.pl

    program:sshd \ src_ip:181.206.20.11 \ --show hostname,program,message @reyjrar
  19. OR MULTIPLE SEARCH PARAMETERS # Join dangling params with OR

    es-search.pl --or program:sshd \ src_ip:181.206.20.11 \ --show hostname,program,message # Join with OR explicitly es-search.pl program:sshd OR \ src_ip:181.206.20.11 \ --show hostname,program,message @reyjrar
  20. GETTING STARTED: I WANT TO USE JQ # Make output

    pipe friendly to jq es-search.pl program:sshd --exists src_ip \ --jq \ | jq -r .src_ip | sort | uniq -c @reyjrar
  21. QUERY STRING EXTENSIONS: BARE WORDS # and, or, not uppercased

    es-search.pl not program:sshd @reyjrar
  22. QUERY STRING EXTENSIONS: IP # Use CIDR Notation for IPs

    es-search.pl src_ip:102.0.0.0/8 \ --show hostname,program,src_ip,src_geoip \ --size 1 @reyjrar
  23. QUERY STRING EXTENSIONS: RANGE # Range and range combos es-search.pl

    dst_port:'<1024' es-search.pl status:'<500,>=400' es-search.pl crit:'>5' \ --show hostname,program,crit,name,src_ip @reyjrar
  24. QUERY STRING EXTENSIONS: TERMS PROMOTION # Don't stress the Lucene

    escapes es-search.pl =exe:'/usr/bin/yum update -y' @reyjrar
  25. QUERY STRING EXTENSIONS: TERMS IN A FILE # Build terms

    from a TSV file, last column es-search.pl src_ip:badguys.dat # Build terms from a TSV file, first column es-search.pl src_ip:badguys.dat[0] # Build terms from a CSV file, last column es-search.pl src_ip:badguys.csv @reyjrar
  26. QUERY STRING EXTENSIONS: TERMS IN A JSON DATA SET #

    Build terms from an NDJSON file .ip es-search.pl src_ip:threatfeed.json[ip] # Build terms from an NDJSON file nested field es-search.pl src_ip:threatfeed.json[actor.ip] @reyjrar
  27. AGGREGATION CAVEATS ➤ Supported during "facets" and ES 0.17 ➤

    Early versions of ES, up to v2.x were splodey ➤ Some limitations which I'm slowly rolling back ➤ per day ➤ Top aggregation must be a bucket ➤ Limited to 2 levels deep ➤ Well, 3 in a certain instance @reyjrar
  28. AGGREGATIONS: TOP THING # Top 20 programs es-search.pl --top program

    # Top 50 programs es-search.pl --top program --size 50 es-search.pl --top program --limit 50 es-search.pl --top program -n 50 @reyjrar
  29. AGGREGATIONS: TOP THING PER HOUR # Top programs with a

    src_ip ever 8 hours es-search.pl --top program _exists_:src_ip \ --interval 8h @reyjrar
  30. AGGREGATIONS: TOP THING WITH ANOTHER THING # Top action with

    the top 3 countries es-search.pl --top action \ --with src_geoip.country # Top action with the top 10 countries es-search.pl --top action \ --with src_geoip.country:10 @reyjrar
  31. AGGREGATIONS: TOP THING BY SOMETHING OTHER THAN DOC COUNT #

    Top src_ip by distinct dst countries es-search.pl --top src_ip \ --by cardinality:dst_geoip.country # Top dst_ip by the total traffic es-search.pl --top dst_ip \ --by sum:out_bytes @reyjrar
  32. AGGREGATIONS: WHERE'S MY DATA GOING # Top src_ip by the

    total traffic # With top dst_ip es-search.pl --top src_ip \ --by sum:out_bytes \ --with dst_ip:1 @reyjrar
  33. AGGREGATIONS: STATISTICS ANYONE? # Top program by average parse time

    # with a statistical summary es-search.pl --top program \ --by avg:total_time \ --with stats:total_time @reyjrar
  34. AGGREGATIONS: PERCENTILES, TOO # Top programs by average parse time

    # With median, 90, and 99th percentile es-search.pl --top program \ --by avg:total_time \ --with percentiles:total_time:50,90,99 @reyjrar
  35. AGGREGATIONS: I GOT YOUR HISTOGRAMS # Top 20 uri by

    average render time # with histogram of 100ms es-search.pl --top program \ --by avg:total_time \ --with histogram:total_time:0.01 @reyjrar
  36. AGGREGATIONS: I'M ALL ABOUT SIGNIFICANCE # Top 20 significant uri

    for search es-search.pl --top significant_terms:uri \ render_ms:>1000 src_country:US # Top 20 significant uri for search, # Background is only US es-search.pl --top significant_terms:uri \ render_ms:>1000 src_country:US \ --bg-filter src_country:US @reyjrar
  37. BUILT WITH CLI::HELPERS ➤ General purpose, functional library for developing

    command line utilities in Perl ➤ Handles input ➤ Provides output customization including color support, 
 --color ➤ Allow output tagged as data to be redirected into a fi le, 
 --data-file=output.dat ➤ NoPaste support via App::NoPaste and --no-paste @reyjrar
  38. NOTES ON APP::NOPASTE ➤ CLI::Helpers will only paste to a

    service fl agged as "public" if you specify --no-paste- public ➤ Subclass an App::NoPaste::Service object for your internal paste service, it's pretty simple ➤ Easily share things with colleagues directly from the command line @reyjrar
  39. PUTTING SOME THINGS TOGETHER # Let's say we have a

    list of bad ip es-search.pl --top src_ip \ _prefix_:path:\/admin \ status:<400 \ src_ip:threatfeed.json[ip] \ --data-file=insidethehouse.dat @reyjrar
  40. PUTTING SOME THINGS TOGETHER # Dump a full log of

    what they've done es-search.pl src_ip:insidethehouse.dat --show src_ip,src_user,uri,out_bytes \ --all # Share with your colleagues es-search.pl src_ip:insidethehouse.dat --show src_ip,src_user,uri,out_bytes \ --all --no-paste @reyjrar
  41. STEAL THIS CODE @reyjrar ➤ Modules make it easy for

    you to interact with ES ➤ App::ElasticSearch::Utilities::QueryString provides all the fun query extenions ➤ App::ElasticSearch::Query provides a simple interface to execute queries ➤ All of these draw on the con fi g fi le and command line switches of App::ElasticSearch::Utilities
  42. FUTURE PLANS ➤ Arbitrary levels of nested aggregations ➤ JSON

    output for aggregations ➤ Better support for nested documents ➤ Arbitrary data joins at query time: rdns, whois, db lookups, etc. ➤ <your idea here> @reyjrar