Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From ELK to Elastic: Modern logging and monitoring

From ELK to Elastic: Modern logging and monitoring

My presentation from Velocity Amsterdam 2016.

Avatar for Tudor Golubenco

Tudor Golubenco

November 07, 2016
Tweet

More Decks by Tudor Golubenco

Other Decks in Technology

Transcript

  1. • Core and main plugins re- written in Java •

    performance improvements • Persistent Queue (WIP) • no drops when killed • Monitoring APIs • no more a black box 6 Logstash new features
  2. 9 Jun 9, 2015 1.6 Jul 16, 2015 1.7 Feb

    19, 2015 4.0 Jun 10, 2015 4.1 May 14th, 2015 1.5 May 27th, 2015 1.0 Beta 1 July 13th, 2015 1.0 Beta 2 Sept 4 th, 2015 1.0 Beta 3 May 23, 2015 1.5 Nov 5, 2014 1.4 It’s complicated es kibana ls beats
  3. Reading cgroup data from /proc/ • Doesn’t require access to

    the Docker API (can be a security issue) • Works for any container runtime (Docker, rkt, runC, LXD, etc.) • Automatically enhances process data with cgroup information 15
  4. Querying the Docker API • Dedicated Docker module • Has

    access to container names and labels • It’s somehow easier to setup 16
  5. #velo Elasticsearch BKD trees 19 • Added for Geo-points •

    faster to index • faster to query • more disk-efficient • more memory efficient
  6. 0 10000 20000 30000 40000 50000 60000 70000 80000 float

    half float scaled float (factor = 4000) scaled float (factor = 100) On Disk Usage in kb Points disk usage (kb) docs_values disk usage (kb) Float values 20 • half floats • scaled floats - great for things like percentage points
  7. #velo Why Elasticsearch for time series • Horizontal scalability. Mature

    and battle tested cluster support. • Flexible aggregations (incl moving averages & Holt Winters) • One system for both logs and metrics • Timelion UI, Grafana • Great ecosystem: e.g. alerting tools 21
  8. • Supports multiple file rotation strategies • “At least once”

    guarantees, handles backpressure • Extra powers: • Multiline • Filtering • JSON decoding 23 Filebeat
  9. #velo 26 batch of messages ack Synchronous sending stream of

    log lines read read acked registry file
  10. #velo 27 batch of messages When things go wrong ack

    0 (still alive) ack 50% ack 100% read acked
  11. • Filebeat adapts it’s speed automatically to as much as

    the next stage can ingest • Simliar to the “pull” model • But: beaware when benchmarking 28 This means..
  12. • Filebeat patiently waits • Log lines are not lost

    • It doesn’t allocate memory, it doesn’t buffer things on disk 29 When the next stage is down..
  13. #velo 33 batch of messages ack batch of messages same

    batch of messages ack duplicates!
  14. #velo Potential strategy to reduce dupes • Filebeat generates an

    UUID for each log line • When indexing to Elasticsearch, use the create API • Deduplication happens in Elasticsearch • But ✦ Duplicates can still happen on Filebeat crashes ✦ Performance penalty at index time 34
  15. Classic logging logging.debug("User '{}' (id: {}) successfully logged in. Session

    id: {}” .format(user["name"], user["id"], session_id)) which results in: DEBUG:root:User 'arthur' (id: 42) successfully logged in. Session id: 91e5b9d 36
  16. Structured logging • Use a logging library that allows code

    like: log = log.bind(user='arthur', id=42, verified=False) log.msg(‘logged_in’) • Which creates log lines like: {"verified": false, "user": "arthur", "session_id": "91e5b9d", "id": 42, "event": “logged_in"} 37