Upgrade to Pro — share decks privately, control downloads, hide ads and more …

インシデントキーメトリクスによるインシデント対応の改善 / Improving Inciden...

インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics

SRE Kaigi 2025の発表資料です。TTXメトリクスがメイントピックです。

https://2025.srekaigi.net/

Narimichi Takamura

January 26, 2025
Tweet

More Decks by Narimichi Takamura

Other Decks in Technology

Transcript

  1. 2

  2. גࣜձࣾTopotalʢͱΆͨΔʣ • h#ps:/ /topotal.com • SREΛओ࣠ʹͨ͠ελʔτΞοϓ • 2ࣄۀΛӡӦ • SRE

    as a Service • SaaS for SREʢWaroomʣ • ຊΠϕϯτͷ Pla;num εϙϯαʔ 3
  3. SRE as a Service • topotal.com/services/sre-as-a-service • SREʹಛԽٕͨ͠ज़ࢧԉαʔϏε • ࢧԉͷྫ

    • SLI/SLOͷಋೖɾӡ༻վળ • CI/CDͷߏஙɾվળ • ΠϯγσϯτϚωδϝϯτͷվળ 4
  4. 6

  5. 8

  6. MTTRʢฏۉ෮چ࣌ؒʣ ͱ͸ • ো֐͕ൃੜ͔ͯ͠Βम෮·ͨ͸෮چ͢Δ ·Ͱͷฏۉ࣌ؒͷ͜ͱ • Mean Time To Recovery(Repair,

    Resolve, Restore)ͷུ • ࢉग़ํ๏1 • MTTR = ૯मཧ࣌ؒ / ૯ނোճ਺ • Four Keys ͷࢦඪͷҰͭͰ΋͋Δ 1 MTTRʢฏۉ෮چ࣌ؒʣͱ͸ʁܭࢉํ๏ͱMTBFͱͷނো཰ɾՔಇ཰ʹ ͓͚Δؔ܎ 11
  7. 12

  8. MTTRͷ༗ޮੑͷݕূ 1. Πϯγσϯτͷσʔληοτ2ΛϥϯμϜʹ2෼ׂ͢Δ 2. ยํͷσʔληοτͷम෮࣌ؒ(TTR)Λ10%ݮΒ͢ 3. ֤σʔληοτͷMTTR(ฏۉम෮࣌ؒ)Λܭࢉ͢Δ 4. σʔληοτؒͷMTTRͷࠩ෼ΛऔΔ •

    diff = MTTR(unmodified) - MTTR(modified) • diff > 0 => MTTR͸վળ • diff < 0 => MTTR͸ѱԽ 5. 1ʙ4Λ10ສճ܁Γฦ͢ 2 σʔληοτ͸ɺ༗໊ͳΠϯλʔ ωοτاۀ3ࣾͷΠϯγσϯτες ʔλεμογϡϘʔυ͔Βऔಘ 15
  9. Incident Metrics in SRE ͷओு • γϛϡϨʔγϣϯ͔ΒΘ͔ͬͨ͜ͱ • Πϯγσϯτ͸ނোظؒͷ͹Β͖͕ͭେ͖͍ͨΊɺվળ݁Ռ͕ MTTR

    ʹ൓ө͞ΕͮΒ͍ • վળͯ͠΋ѱԽ͢Δέʔε΋ͦͦ͋͜͜Δ • ݁࿦ • MTTR ͸վળͷධՁࢦඪͱͯ͠໾ʹཱͨͳ͍ 20
  10. ิ଍: TTRͷ࢖͍ಓ ฏۉ஋(MTTR)͸େࡶ೺͗͢Δ → ෼෍ͷൺֱ͸՝୊ൃݟͷࢳޱʹͳΔ • ex. ଈ࣌෮چͷো֐͕ݮগ • →

    ܰඍͳো֐ͷࣗಈ෮چͷ੒Ռʁ • → ো֐ݕ஌ͷ࢓૊Έʹෆ۩߹ʁ • ex. ϒϥοΫεϫϯΠϕϯτ͕૿Ճ • → ίʔυ΍Πϯϑϥͷ඼࣭௿Լʁ 27
  11. 32

  12. 37

  13. 38

  14. 40

  15. 45

  16. ϝτϦΫεͱվળࢪࡦͷྫ TTX ՝୊ վળࢪࡦ TTDetectʢݕ஌ʣ ൃੜ͔ͯ͠Βݕ஌·Ͱʹ࣌ ͕͔͔ؒΔ ϞχλϦϯάͷվળ TTEngageʢνʔϜߏ੒ʣ ରԠνʔϜΛߏஙʹ͕࣌ؒ

    ͔͔Δ γϑτ΍໾ׂͷ໌֬ԽɺΦ ϯίʔϧ੍౓ͷಋೖ TTInves-gateʢௐࠪʣ ো֐੾Γ෼͚ʹ͕͔͔࣌ؒ Δ RunbookͷμογϡϘʔυͷ ੔උ TTFixʢम෮ʣ ো֐ͷम෮ʹ͕͔͔࣌ؒΔ ϩʔϧόοΫͷߴ଎Խ 46
  17. 47

  18. 49

  19. 50

  20. ൃలͳϝτϦΫεͷྫ ސ٬ରԠ΍ࠜຊରࡦʹয఺Λ౰ͯɺ͞·͟·ͳϩʔϧΛר͖ࠐΈɺ૊৫తͳΠϯγσϯτରԠΛՃ଎ͤ͞ Δ ϝτϦΫε໊ λʔήοτϩʔϧ ໨త Incident Response Metrics Engineer

    ७ਮͳ෮چରԠͷ՝୊ಛఆɾվળ ࢦඪ Customer Reliability Metrics Sales, CRE ސ٬ରԠͷ՝୊ಛఆɾվળࢦඪ Learning Metrics Maneger, Engineer ૊৫ֶ͕ͼΛಘΔ·Ͱͷ׆ಈͷτ ϥοΩϯά Improvement Metrics Maneger, Engineer ࠜຊରࡦͷ࣮ࢪঢ়گͷ෼ੳ 53
  21. ·ͱΊ ҎԼͷ5఺Λ͓఻͑͠·ͨ͠ɻෆ໌఺͕͋Γ·ͨ͠ΒɺAsk the Speaker΁͓ӽ͍ͩ͘͠͞ʂ 1. MTTR͸վળࢦඪͱͯ͠໾ཱͨͳ͍ • ཧ༝: Πϯγσϯτσʔλͷมಈੑ͕ߴ͍͔Β 2.

    ϝτϦΫε׆༻͸ɺ໨తʙσʔλ෼ੳʹࢸΔ·Ͱͷ੔߹ੑ͕ॏཁ 3. มಈੑΛ཈͑ΔͨΊʹ͸ɺ໰͍ͷ۩ମԽͱϝτϦΫεͷࡉ෼Խ͕ॏཁ 4. Waroomʹ͓͚ΔTTXϝτϦΫεͷఆٛաఔͱ׆༻ํ๏ 5. αʔϏε෮چҎ֎ʹॏཁͳϝτϦΫε 54