Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fixing Your Noisy Pager in 500 Easy Steps

Fixing Your Noisy Pager in 500 Easy Steps

You're not sure when it happened, but your pager suddenly seems noisy. You've started dreading your on-call shifts before they begin. You breathe a sigh of relief every time you sleep without interruption. Sound familiar?

Noisy on-call rotas sneak up on us one page at a time - an edge case in a new feature, an alert with too many false positives, processes that get stuck and need restarting. Each of these is easy to tolerate alone, but they quickly add up, leaving you swamped in alert noise and tired from missed sleep.

In this talk we'll explore techniques for digging ourselves out of the hole. We'll look at how to demonstrate the scale of the issue to our colleagues, what to do when the list of problems seems insurmountable, and how to get started with automated remediation in a low-risk way - I promise it's less scary than it sounds.

Chris Sinjakli

October 29, 2024
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. Hi

  2. Pages per month (2023–24) Aug Sep Oct Nov Dec Jan

    Feb Mar 0 500 1,000 1,500 Daytime Evening Night ...
  3. Pages per month (2023–24) Aug Sep Oct Nov Dec Jan

    Feb Mar 0 Daytime Evening Night 500 1,000 1,500
  4. Pages per month (2023–24) Aug Sep Oct Nov Dec Jan

    Feb Mar 0 Daytime Evening Night 500 1,000 1,500
  5. Compelling reasons - Pager fatigue: we miss real issues -

    Tiredness: people can’t do their best work - Learned helplessness: we don’t believe we can improve things
  6. Compelling reasons - Pager fatigue: we miss real issues -

    Tiredness: people can’t do their best work - Learned helplessness: we don’t believe we can improve things
  7. Compelling reasons - Pager fatigue: we miss real issues -

    Tiredness: people can’t do their best work - Learned helplessness: we don’t believe we can improve things
  8. Compelling reasons - Pager fatigue: we miss real issues -

    Tiredness: people can’t do their best work - Learned helplessness: we don’t believe we can improve things
  9. Problem shape - Recurring problem: happens regularly - Reliable detection:

    highly correlated alert - Mechanical fi x: on-caller follows runbook
  10. Problem shape - Recurring problem: happens regularly - Reliable detection:

    highly correlated alert - Mechanical fi x: on-caller follows runbook
  11. Problem shape - Recurring problem: happens regularly - Reliable detection:

    highly correlated alert - Mechanical fi x: on-caller follows runbook
  12. Problem shape - Recurring problem: happens regularly - Reliable detection:

    highly correlated alert - Mechanical fi x: on-caller follows runbook
  13. auto-repair (simpli fi ed) alerts = get(“prom:9090/api/v1/alerts") issues = filter_fixable(alerts)

    for i in issues do // for most issues, restart process apply_fix(i) end
  14. auto-repair (simpli fi ed) alerts = get("prom:9090/api/v1/alerts") issues = filter_fixable(alerts)

    for i in issues do // for most issues, restart process apply_fix(i) end
  15. auto-repair (simpli fi ed) alerts = get("prom:9090/api/v1/alerts") issues = filter_fixable(alerts)

    for i in issues do // for most issues, restart process apply_fix(i) end
  16. auto-repair (simpli fi ed) alerts = get("prom:9090/api/v1/alerts") issues = filter_fixable(alerts)

    for i in issues do // for most issues, restart process apply_fix(i) end
  17. 3 limitations Don’t restart: - Too many processes with the

    same issue - The same instance repeatedly - Processes that have already paged
  18. Don’t restart: - Too many processes with the same issue

    - The same instance repeatedly - Processes that have already paged 3 limitations
  19. Don’t restart: - Too many processes with the same issue

    - The same instance repeatedly - Processes that have already paged 3 limitations
  20. Don’t restart: - Too many processes with the same issue

    - The same instance repeatedly - Processes that have already paged 3 limitations
  21. Image credits • Analog Alarm Clock in Morning Sunlight -

    Ruslan Sikunov - https:// www.pexels.com/photo/analog-alarm-clock-in-morning-sunlight-19188894/