Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tackling Alert Fatigue

Tackling Alert Fatigue

Caitie McCaffrey

June 26, 2016
Tweet

More Decks by Caitie McCaffrey

Other Decks in Technology

Transcript

  1. Tackling Alert Fatigue
    Monitorama 2016

    View full-size slide

  2. CaitieM.com
    Distributed Systems Engineer
    Caitie McCaffrey
    @caitie

    View full-size slide

  3. “When alerts are more often
    false than true, the on-call’s
    sense of urgency in
    responding to alerts is
    diminished … the simple burden
    of alerts desensitizes the on-call
    to alerts.”

    View full-size slide

  4. “When alarms are more often
    false than true, the nursing
    staff’s sense of urgency in
    responding to alarms is
    diminished … the simple burden
    of alerts desensitizes caregivers
    to alarms.”
    Novel Approach to Cardiac Alarm Management on Telemetry Units

    View full-size slide

  5. The High Cost of:
    Alert Fatigue
    Ignored Alerts
    Unreliable Systems
    Unhappy Customers

    View full-size slide

  6. The High Cost of:
    Alert Fatigue
    Unplanned Work
    Inability to Complete
    Planned Work
    Less Time to Focus
    on Core Business

    View full-size slide

  7. The High Cost of:
    Alert Fatigue
    Fatigue
    Fire- Fighting
    Burnout

    View full-size slide

  8. Tackling Alert Fatigue
    Increase thresholds for patient vitals
    Only Crisis Alarms would emit audible alerts
    Nursing staff required to tune false positive alerts
    in hospitals
    Novel Approach to Cardiac Alarm Management on Telemetry Units

    View full-size slide

  9. Cmd Line Tool Viz / Dashboad Alerting Svc
    Cuckoo-Read
    Cuckoo-Write
    Indexing Svc
    Relay Svc
    Twitter Front End
    Twitter
    Svc
    Twitter
    Statsite
    Twitter
    Svc
    Twitter
    Svc
    Scribe
    Collection
    Agent
    HDFS
    Manhattan Database
    Public Cloud
    Observability at Twitter

    View full-size slide

  10. Runbook & Alert Audits

    View full-size slide

  11. Runbook & Alert Audits

    View full-size slide

  12. Runbook & Alert Audits

    View full-size slide

  13. Runbook & Alert Audits

    View full-size slide

  14. Empower the Oncall
    Tune Alert Thresholds
    Disable or Delete Inactionable Alerts

    View full-size slide

  15. Business Hours Alerts

    View full-size slide

  16. Weekly On-Call Retro
    Handoff on going issues
    Review alerts fired in the previous week
    Schedule work to improve on-call or reliability

    View full-size slide

  17. –Astrid Atkinson
    “The goal is not to
    never get paged, the
    goal is to never get
    paged for the same
    thing twice”
    Engineering for the Long Game

    View full-size slide

  18. 50% Reduction of Alerts
    In One Quarter

    View full-size slide

  19. On-call slept through the night
    More time to do scheduled
    work while on-call
    Faster to ramp up new teammates

    View full-size slide

  20. Q1-Q3 2015 Q4 2015
    Improved Visibility
    Q1 2016
    Alerts Per Service

    View full-size slide

  21. Critical Alerts Need to
    Be Actionable

    View full-size slide

  22. Do Not
    Alert on
    Machine
    Specific
    Metrics

    View full-size slide

  23. The Tech Lead or Engineering
    Manager should be on-call

    View full-size slide

  24. Cultural Change

    View full-size slide

  25. The goal is to build
    systems that can scale
    linearly with machines &
    sub-linearly with people

    View full-size slide

  26. More Reliable Systems
    Less Unplanned Work
    Happier Developers
    Benefits of:
    Tackling Alert Fatigue

    View full-size slide

  27. Thank you!
    @caitie
    https://github.com/CaitieM20/Monitorama2016
    References:

    View full-size slide