Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What's New In Elasticland?

Elasticsearch Inc
March 17, 2016
870

What's New In Elasticland?

Following Elastic{ON}16 this is a recap of the key announcements followed by brief summaries of the talks on BM25 and FireEye's TAP security platform

Elasticsearch Inc

March 17, 2016
Tweet

Transcript

  1. 3 Attendees 1,802 Elastic{ON} 16 – Pier 48, San Francisco,

    CA February 17 – 19, 2016 All recordings on https://www.elastic.co/elasticon/conf/2016/sf Days 3 Talks 28
  2. 5

  3. 6

  4. 7

  5. 14

  6. 16

  7. 20 Pipeline Aggregations Thu 31 Smooth Average Data Value Upper

    Control Limit August Aug 03 Tue 05 Thu 07 Sat 09 Mon 11 Wed 13 Fri 15 Aug 17 Tue 19 10 20 30 40 50 60 70 10 20 30 40 50 60 70
  8. 22

  9. 26

  10. 30

  11. 40 It’s complicated es kibana ls beats Nov 5, 2014

    1.4 May 23, 2015 1.5 Jun 9, 2015 1.6 Jul 16, 2015 1.7 Feb 19, 2015 4.0 Jun 10, 2015 4.1 May 14, 2015 1.5 May 27, 2015 1.0 Beta 1 Jul 13, 2015 1.0 Beta 2 Sep 4, 2015 1.0 Beta 3
  12. 42

  13. 44

  14. 51 Simple API that combines Search and Graph Techniques 21

    GET /wikipedia/_graph/explore { "query": { "query_string": { "query": "Jack Johnson” } }, "vertices": [{ "field": “artists.raw” }], "connections": { "vertices": [{ "field": “artists.raw" }] } }
  15. 56 Lucene Practical Scoring Function score(q, d) = queryNorm(q) ・coord(q,

    d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q) Query normalization factor - Normalize query to compare with results from other queries Coordination factor - Reward documents with more individual query terms Term frequency – How often does the term appear in the doc? Inverse document frequency – How often does the term appear in all docs? Term boost Field length norm – How long is the field?
  16. 57 TF/IDF score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t

    in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q) TF/IDF
  17. 58 TF/IDF and BM25 score(q, d) = queryNorm(q) ・coord(q, d)

    ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q)
  18. 63 Why BM25? • Less influence of common words •

    Short fields not auto-boosted • Tweakable parameters (beware) • Is BM25 better? ‒ Literature suggests so ‒ Challenges suggest so (TREC, ...) ‒ Users say so ‒ Lucene developers say so ‒ Konrad Beiske says so: Blog “BM25 vs Lucene Default Similarity” • But: It depends on the features of your corpus
  19. 64 You’ll see. If you don’t like it, change it

    to any of: TF/IDF DFR DFI IB LM
  20. 70 11 Analysts asks TAP Citrix connections originating from Russia,

    China, Ireland and grouped by duration, received bytes, and destination port
  21. 72 13 Elasticsearch DSL {"query":{"filtered":{"filter":{"and":[{"range":{"meta_ts": {"gte":"2016-01-26T17:00:00.000Z","lte":"2016-02-02T17:10:20. 478Z"}}},{"term":{"class":"bro_conn"}},{"terms":{"dstipv4": {"index":"lists","type":"indicator","id":"external_citrix_s ervers","path":"values","cache":false}}},{"terms": {"srccountrycode":["ru","cn","ir"],"execution":"or"}},

    {"term":{"connstate":"sf"}},{"limit":{"value": 208333}}]}}},"aggs":{"groupby:duration_rcvdipbytes_dstport": {"terms":{"lang":"native","script":"join","params": {"fields":["duration","rcvdipbytes","dstport"],"separator":", "},"size":200,"min_doc_count":1,"order": {"_count":"desc"}}}},"size":10,"from":0,"timeout":120000}
  22. 73 19 Raw Storage Across ~ 40 production clusters 3.6P

    700B 300K Production Footprint EPS Events per second indexed to production Indexed Events In 400+ Nodes Peak 20B/day
  23. 75 23 Show me credit card data! {"query":{"filtered":{"filter":{"and":[{"range": {"meta_ts": {"gte":"2015-10-25T13:00:00.000Z","lte":"2015-10

    -26T13:37:07.554Z"}}},{"query":{"common": {"metaclass": {"query":"http_proxy","low_freq_operator":"and", "high_freq_operator":"and","cutoff_frequency": 0.001,"analyzer":"standard"}}}},{"script": {"script":"regexp","lang":"native","params": {"regexp":".*encoding\\\\=.*\\\\&t\\\\=.*\\\\&cc \\\\=.*\\\\&process\\\\=.*\\\\&track\\\ \=/","field":"uri","limit":-1}}}]}}},"size": 10,"from":0,"timeout":120000}
  24. 77 25 Eggs Fried Per Query 1 2 3 4

    Thermal mass for a single egg is 274 J / °C Integrated temperature from 4 to 80 C gives us total heat of: 274 J/C * (80 - 4 °C) D2 series uses Haswell Intel Xeon E5-2673v3 processors Thermal Design Power: 120W We used 8 cores of the 12 cores total for .75 * 120W * 135 Procs = 90W Total Query execution time in seconds: 83 min x 60 s 5 Total Energy = 12,150W * 4980 seconds (length of query) 274 J/°C 20,812 J 12,150 W 4,990 s 60.5 MJ
  25. 79 16 0 1 2 3 4 5 6 7

    8 0.9 (3) 1.1 (3) 1.2 (3) 1.3 (3) 1.5 (3) 1.7 (4) Wakeups per week Elasticsearch version (Number of kids) Elasticsearch Kids
  26. 81 ALSO WATCH Graph Capabilities in the Elastic Stack All

    Quiet on the Digital Front: Security Analytics @ USAA OpenSource Connections: The Ghost in the Search Machine Grid Monitoring at CERN with the Elastic Stack Contributing to Elasticsearch: How to Get Started All recordings: https://www.elastic.co/elasticon/conf/2016/sf/
  27. 82 Core Elasticsearch: Operations STOCKHOLM, Sweden April 25 Core Elasticsearch:

    Developer STOCKHOLM, Sweden April 26 - 27 training.elastic.co Public Training