Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The hidden cost of instrumentation at Conf42 Devops 2023

The hidden cost of instrumentation at Conf42 Devops 2023

Prathamesh Sonpatki

January 27, 2023
Tweet

More Decks by Prathamesh Sonpatki

Other Decks in Technology

Transcript

  1. 3 Instrumentation? 🤔 🤨 - How do you know your

    application is running as expected?
  2. 4 Instrumentation? 🤔 🤨 - How do you know your

    application is running as expected? - Service Level Agreements(SLA)
  3. 5 Instrumentation? 🤔 🤨 - How do you know your

    application is running as expected? - Service Level Agreements(SLA) - Good night’s sleep 😴 💤
  4. 🌈 Landscape of the Instrumentation 10 - Your application is

    not standalone - It’s actually a 🍔
  5. 🌈 Landscape of the Instrumentation 11 - Your application is

    not standalone - It’s actually a 🍔 - The Bun(Cloud/VM) - Patty(application) - Along with Mayo sauce(RDS/DB) - And Ketchup(Third party services)
  6. 🌈 Landscape of the Instrumentation 12 - Your application is

    not standalone - It’s actually a 🍔 - The Bun(Cloud/VM) - Patty(application) - Along with Mayo sauce(RDS/DB) - And Ketchup(Third party services) “Full stack observability” FTW!
  7. 💡Modern applications are like living organisms that grow and shrink

    in all possible directions. And also communicate with their friends! 13
  8. Bow in the Temple of Observability 󰚍 16 - Logs

    - Metrics - Traces - Profiling - Events (External) - Exceptions https://medium.com/@YuriShkuro/temple-six-pillars-of-observability-4ac3e 3deb402
  9. Bow in the Temple of Observability 󰚍 17 - Logs

    - Metrics - Traces - Profiling - Events (External) - Exceptions How many people use more than 3 from these at the same time??
  10. Cardinality/Churn 19 - Capturing monitoring data is easier than ever

    today. - A 3-node Kubernetes cluster with Prometheus will ship around 40k active series by default!
  11. Operations - Run, manage and operate the instrumentation of the

    entire stack. - One more thing to operate besides the app. 20
  12. Scale - Make sure not just your app scales but

    also your instrumentation. 21
  13. Distraction! 27 - Reduce the Datadog monitoring cost, it is

    going out of hand. - Our logs are piling up from last 2 days, can you please look at it as P0 and contain them? Otherwise vendor will charge us double.
  14. Distraction! 28 - Reduce the Datadog monitoring cost, it is

    going out of hand. - Our logs are piling up from last 2 days, can you please look at it as P0 and contain them? Otherwise vendor will charge us double. - Today is new year’s day and our prometheus is not getting required metrics. Ignore the product release, just fix this for now, we are blind otherwise.
  15. 💡A modern systems engineer has to not just maintain their

    software but also Instrumentation of that software. 29
  16. Fatigue! 31 - Too much information de-sensitises us. - Duplicate

    alarms. - Focus on getting more and more data rather than why even we are getting it. - Debugging becomes difficult because there is just too much of data, we don’t know from where to start.
  17. What’s the way out? 🏆 33 - Focus on data

    that gives early warnings with least amount of data
  18. What’s the way out? 🏆 34 - Focus on data

    that gives early warnings with least amount of data - Think about Apple watch ⌚ - only vitals such as heart rate or sleep metric.
  19. What’s the way out? 🏆 35 - Focus on data

    that gives early warnings with least amount of data - Think about Apple watch ⌚ - only vitals such as heart rate or sleep metric. - Detailed X-Ray scans and ECG reports 📰 once the vitals are off the track.
  20. Plan of action 38 - Plan what to measure why

    not how - Emit (only what you need) - Observe and Track (usage) - Prune (unused) aggressively - Store less for less amount of time. - Focus on what can give best value for the money
  21. A Better Plan of action 39 - Access Policies -

    Data storage policies - Standards
  22. Thanks 41 Prathamesh Sonpatki Last9.io Blog - https://prathamesh.tech Twitter -

    https://twitter.com/_cha1tanya Matsodon - https://hachyderm.io/@Prathamesh “Last9 of Reliability” Discord