Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Who owns the Service Level?

Who owns the Service Level?

Avatar for Takeshi Kondo

Takeshi Kondo

May 15, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Who am I chaspy chaspy_ Engineering Manager, Site Reliability at

    Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
  2. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this • αʔϏε͕ʮߴ͍৴པੑ

    (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ • SLI/SLO ΛकΕ͍ͯΔ͜ͱ • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ https://github.com/twitter/twemoji
  3. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this! • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ •

    SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ ࢦඪͱͯ͠׆༻͍ͯ͠Δ • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ https://github.com/twitter/twemoji
  4. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷ৴པੑʹ ؔ͢Δ

    Capability औಘ Λࢧԉ͢Δ ࣗ෼ͨͪͷαʔϏεͷ ৴པੑΛࣗ෼ͨͪͰί ϯτϩʔϧͰ͖͍ͯΔ
  5. Team Topologies • 4ͭͷνʔϜύλʔϯ • Stream Aligned • Platform •

    Enabling • Complicated Subsystem • 3ͭͷίϛϡχέʔγϣϯύλʔϯ • Collaboration • X as a Service • Facilitation https://pub.jmam.co.jp/book/b593881.html
  6. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ԽΛ ࢧ͑ΔϓϥοτϑΥʔϜͱ

    จԽΛ࡞Δ Platform Team Enabling Team Stream Aligned Team ࣗ෼ͨͪͰඞཁͳ΋ͷΛ ࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
  7. ͳͥࣗݾ׬݁Խ͕ॏཁ͔ SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ԽΛ ࢧ͑ΔϓϥοτϑΥʔϜͱ จԽΛ࡞Δ Platform Team Enabling

    Team Stream Aligned Team ࣗ෼ͨͪͰඞཁͳ΋ͷΛ ࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
  8. ͳͥࣗݾ׬݁Խ͕ॏཁ͔: Not “VS”, but “And” • Dev vs and Ops

    • Ϣʔβ͔Βߴ଎ʹϑΟʔυόοΫΛಘΔ (DevOps) • Dev vs and Infrastructure • ηϧϑαʔϏεͰߏஙͯ͠ϦʔυλΠϜ୹ॖ • Productivity vs and Reliability • ੜ࢈ੑͱ৴པੑ͸૬ޓʹґଘ͢Δ
  9. • ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭͯ ͍Δ͜ͱ • ։ൃνʔϜ͕”ࣗݾ׬݁Խ”͍ͯ͠Δঢ়ଶ • SRE νʔϜ͸͜ΕΛϓϥοτϑΥʔϜͱจԽৢ੒Ͱࢧ͑Δ

    • ͜ΕΛ࣮ݱ͢Δʹ͸ϓϩμΫτ։ൃʹด͡ͳ͍ଟ༷ͳࢹ఺͕ඞཁ • Ϣʔβͷظ଴஋Λ஌Δ / Product Management • ߴ͍։ൃੜ࢈ੑ / Development Skills • ඇػೳཁٻʹͲΕ͚ͩίετΛ͔͚Δ͔ / Business Development ·ͱΊɿSRE Λ࣮ݱ͢Δͱ͸
  10. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ @chaspy ೖࣾޙ • 2018: @chaspy ೖࣾ • 2019:

    Application Platform Λ Kubernetes ΁Ҡ؅ • 2020: Microservices Readiness ͷ੔උ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ Platform Team ͱͯ͠ Platform Λ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ૊৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ੒
  11. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ 2021೥ • COVID-19 ྲྀߦɺΞΫηε਺૿େ • Platform ͷਐԽ •

    Terraform monorepo • Loadtest Platform • GitHub Actions ʹΑΔ monorepo CI ෼཭ • ૊৫ͷมԽ • ٕज़ઓུάϧʔϓൃ଍ • ࣄۀҠ؅ʹΑΓϦΫϧʔτ΁స੶ɺQuipper ೔ຊࢧళਫ਼ࢉ • chaspy EM ೚༻
  12. ૊৫ن໛ͷਪҠ      ։ൃऀ 35 53 54

    73 114 43& 4 5 7 7 7 ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ 2022೥͔Β͸ۀ຿ҕୗͷํ΋Χ΢ϯτ͍ͯ͠Δɻ2021೥Ҏલ΋ۀ຿ҕୗͷํͱ࢓ࣄ͸͍ͯͨ͠ɻ
  13. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ͍ͣΕͷ࣌୅΋ Platform Team ͱ Enabling Team ͷৼΔ෣ ͍Λ͍ͯ͠Δ

    • ಛʹ2019೥͔Β͸ʮࣗݾ׬݁ԽʯΛςʔϚʹɺ͓ئ͍͞Ε Δ͜ͱΛۃྗݮΒͤΔ Platform Λ࡞͖ͬͯͨ • ಉ࣌ʹ։ൃνʔϜͷʮจԽΛͭ͘Δʯ͜ͱʹ౿ΈࠐΈɺSLI/ SLO Λݟ͍ͯ͘จԽΛ૊৫ʹৢ੒ͨ͠ • →ʮSLO Reviewʯat SRE NEXT 2020
  14. ʮSLO Reviewʯat SRE NEXT 2020 • ։ൃ૊৫ʹ SLO Λ Review

    ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product, 15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷ4εςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review
  15. Α͔ͬͨ఺ • ։ൃ૊৫ʹ SLO Λ Review ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product,

    15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷεςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review ։ൃνʔϜͷೝ஌ෛՙΛపఈత ʹԼ͛Δ͜ͱʹͩ͜Θͬͨ ໨తෆ࣮֬ੑͷ௿ݮͷͨΊ ϑΟʔυόοΫαΠΫϧΛճͨ͠
  16. Α͘ͳ͔ͬͨ఺ʁ • ։ൃ૊৫ʹ SLO Λ Review ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product,

    15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷεςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review ͳͥ͏·͍͔͘ͳ ͔ͬͨͷ͔ʁ
  17. ͳͥ”ߦಈ͢Δ”·ͰࢸΒͳ͔ͬͨͷ͔ • ౰࣌ɺSLO ҧ൓࣌ͷΞΫγϣϯ͸ Product Manager / Team ʹҠৡ͍ͯͨ͠ •

    ·ͬͨ͘Կ΋Ͱ͖ͳ͔ͬͨΘ͚Ͱ͸ͳ͍ • ΋ͱ΋ͱνʔϜʹ༧ࢉͷ͋ΔɺվળͷͨΊͷ࣌ؒͰͰ͖Δ͜ͱ͔͠Ͱ͖ͳ͔ͬͨʢִि1೔ʣ • QB Day ͱݺ͹ΕΔ • Τϥʔʹର͢Δ௚઀తͳରॲɺܰඍͳ Performance վળͳͲ • ΞʔΩςΫνϟมߋɺΠϯϑϥ෼཭ͳͲɺ௕ظతɾࠜຊతରॲ͸೉͔ͬͨ͠ • ʮࢦඪΛݩʹػೳཁٻͱඇػೳཁٻͷ༏ઌ౓൑அ͕Ͱ͖Δʯ·Ͱ౸ୡ͠ͳ͔ͬͨ • ༏ઌ౓൑அʹ໾ʹཱͨͳ͍ͷͰ͋Ε͹ɺ։ൃνʔϜʹͱͬͯ΍Δ͜ͱ͕૿͚͑ͨͩͱ΋ݴ͑Δ
  18. ʮSLO Reviewʯat SRE NEXT 2020 ͦͷޙͷ·ͱΊ • ʮ৴པੑࢦඪΛఆΊɺ؍࡯͢ΔʯจԽΛ࡞ͬͨ͜ͱʹ͸Ձ஋͕͋ͬͨ • ࣄۀઓ্ུͷҙࢥܾఆʹ໾ʹཱͭࢦඪʹҭͭ·Ͱʹ͸ࢸΒͳ͔ͬͨ

    • ཧ༝1. ඇػೳཁٻͱػೳཁٻͷόϥϯεΛม͑Δҙࢥܾఆݖݶɾ༧ࢉ͕ϓϩμΫτ ։ൃνʔϜʹͳ͔ͬͨ • ৽نػೳ։ൃͷΠϯηϯςΟϒ͕େ͖͍ঢ়گ • ͦͷΑ͏ͳٕज़ઓུ/ٕज़౤ࢿΛϓϩμΫτ All Ͱߦ͑Δ࢓૊Έ͕ͳ͔ͬͨ • ཧ༝2. ৴པੑࢦඪ͕ Biz/Dev/SRE શһ͕ཧղ͠΍͍͢ࢦඪͰ͸ͳ͔ͬͨ • backend API ͷ SLI ͸ϢʔβମݧΛ௚઀ද͓ͯ͠ΒͣɺLatency ʹؔ͢Δରॲ͸ TPM ΁ͷઆ໌΋೉͍͠
  19. ελσΟαϓϦখֶɾதֶɾߴߍɾେֶडݧߨ࠲ ελσΟαϓϦ For TEACHERS ελσΟαϓϦ For SCHOOL ݱঢ়ͷ૊৫ਤ: খதߴϓϩμΫτ։ൃ෦ ҎԼ17άϧʔϓ

    TPM BtoB TPM BtoC TPM ForSCHOOL TPM ԣஅ BtoC BtoB QA ։ൃࢧԉ SRE ٕज़ઓུ ίʔνϯά ৽ن։ൃ1 Τϯϋϯε ֶशࢧԉ Native iOS Android ৽ن։ൃ2 ਐ࿏ओମੑ ίϛϡχέʔγϣϯࢧԉ ForSCHOOLϞόΠϧ
  20. Disclaimer • ٕज़ઓུάϧʔϓͷ্ཱͪ͛͸લ೚Ϛωʔδϟ͕ߦͬͨ΋ͷ • ࡢ೥࣌఺Ͱ͸ @chaspy ͸ DevOps WG ͷ

    Lead -> EM/Lead • લ೚ͷୀ৬ʹ൐͍ٕज़ઓུάϧʔϓͷ EM ͸෦௕͕݉೚ͭ͠ ͭɺଞ਺໊ͷ EM ͱҰॹʹӡӦ͍ͯ͠Δ • SLO ҧ൓ͷରॲ͕Ͱ͖ͳ͍͜ͱ͕ཧ༝Ͱ্ཱ͕ͪͬͨΘ͚Ͱ ͸ͳ͍
  21. ͳٕͥज़ઓུ”άϧʔϓ”͕ඞཁ͔ • ٕज़ઓུͷܾΊํ͸૊৫ʹΑͬͯҟͳΔ • 1ਓͷ CTO ͕τοϓμ΢ϯͰܾΊͯ΋͍͍ • ϘτϜΞοϓͰશһ߹ٞͰܾΊͯ΋͍͍ •

    ͦͷதؒͰ΋͍͍ • ελσΟαϓϦখதߴ։ൃ૊৫͸ٕज़ઓུΛ1ਓʹґଘ͠ͳ͍࢓૊ΈΛ ࡞Δ͜ͱʹ௅ઓ͍ͯ͠Δ
  22. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

    ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
  23. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

    ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
  24. ΋ͪΖΜɺ׬ᘳͰ͸ͳ͍ • ٕज़՝୊͸ස౓ͱڧ౓ͰଌΕΔ΋ͷͰ͸ͳ͍ • ఆੑతͰ͋Δ • ࢀՃϝϯόʔͷภΓ͕͋Δ͔΋ • ෳ਺ member

    ͷ vote ݁Ռͷॏ৺ʹஔ͍͍ͯΔͷͰਫ਼౓ʹٙ໰ • ։ൃϦιʔεɺٕज़త೉қ౓ɺϦεΫʹΑ͙ͬͯ͢ʹऔΓ͔͔Εͳ ͍՝୊΋͋Δ • ՝୊ͷ༏ઌ౓෇͚ʹ͕͔͔࣌ؒΔ • etc…
  25. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

    ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
  26. DevOps WG ͷ໨తͱ׆ಈ • ໨తɿʢ։ൃνʔϜͷʣࣗݾ਍அೳྗͷ֫ಘͷͨΊʹઃஔ • ϝϯόʔ͸ ྖҬ͝ͱͷ WebDev /

    QA / SRE • ׆ಈ಺༰ • όϦϡʔετϦʔϜϚοϐϯάͷ࣮ࢪ • ީิͱͳΔ Metrics / Indicator ͷચ͍ग़͠ͱܭଌ • DX Criteria ͷ࣮ࢪ • όϦϡʔετϦʔϜΛ્֐͢ΔཁҼͷղܾ(e.g. E2E Automation) • ϓϩμΫτ։ൃ෦֎΁ͷ޿ใ׆ಈ • ·ͣ͸༗ޮͦ͏ͳ metrics ΍ΞηεϝϯτΛݕূͨ͠
  27. ϓϩμΫτ։ൃ෦֎Ͱͷ޿ใ׆ಈ: BtoC All Hands Ͱͷൃද https://blog.studysapuri.jp/entry/2020/08/17/dx-criteria-system • ॴଐάϧʔϓΛ௒͑ͨࣄۀঢ়گΛ஌Δ৔ • Ϛʔέοτχϡʔε

    • ࣄۀঢ়گ • ϓϩμΫτ KPI • SLI / ։ൃऀੜ࢈ੑ • ͦͷଞτϐοΫ͞·͟· • SRE ͱ͸ͳʹʁ • ϚΠΫϩαʔϏεͬͯͳʹʁ͏Ε͍͠ͷʁ
  28. SRE ͱٕज़ઓུ • DevOps WG ͷ׆ಈ͸ ʮSRE ͷ࣮ݱʯͷจԽ໘Ͱͷ֦ு • զʑ͕ݟΔ΂͖ࢦඪ͸γεςϜͷ৴པੑࢦඪ͚ͩͰ͸ͳ͍

    • ͋ΒΏΔ΋ͷΛࢦඪΛݟͯɺҙࢥܾఆ͢Δ • ࠓޙ͸͜ͷจԽৢ੒ͦͷ΋ͷͷվળαΠΫϧΛճ͢ • 1. ީิͱͳΔ metrics ͷ༗ޮੑ͕໌Β͔ʹͳΓɺ਺஋Խ͢Δ • 2. ։ൃνʔϜ͕ͦΕΛݟͯɺΞΫγϣϯΛߟ͑Δ͜ͱ͕Ͱ͖Δ • 3. ։ൃνʔϜ͕ΞΫγϣϯ->վળͷαΠΫϧΛճ͢ • 4. 1-3 ͦΕࣗମ͕͏·͍͍ͬͯ͘Δ͔ΛධՁ͢Δ
  29. SRE ͱٕज़ઓུ: ·ͱΊ • SRE Λ࣮ݱ͢ΔͨΊʹ͸ɺSLO ҧ൓Λͨ࣌͠ʹߦಈͰ͖Δ༧ࢉͱݖݶ ͕ඞཁ • ͦͷ্Ͱɺٕज़՝୊Λղܾ͢Δ༏ઌॱҐΛ͚ͭΒΕΔٕज़ઓུ͕ඞཁ

    • ʰελσΟαϓϦʱখதߴϓϩμΫτ։ൃ෦Ͱ͸͜ͷٕज़ઓུΛ1ਓ ʹґଘͤͣɺάϧʔϓͰ࣮ݱ͢Δ͜ͱʹ௅ઓ͍ͯ͠Δ • ͋ΒΏΔ΋ͷΛࢦඪͰݟ͍ͯ͘จԽ͕৴པੑͷͨΊʹॏཁ
  30. SRE “NEXT” in ʰελσΟαϓϦʱ • “৴པੑ” ʹؔͯ͠͸ Enabling Team ͱͯ͠ͷ

    SRE Team ͸໾ ׂΛՌͨͭͭ͋͠Δ • SRE Team ͷࠓޙ • ΑΓ૊৫Λ Sustainable / Scalable ʹ͢ΔͨΊʹɺPlatform ʹؔ͢Δ ΦϯϘʔσΟϯάͷ֦ॆ΍ɺ։ൃνʔϜ͕ࣗ཯తʹ৴པੑʹؔ͢Δ Capability शಘΛ൑அͰ͖ΔΞηεϝϯτΛఏڙ͢Δ • ৴པੑ͚ͩͰ͸ͳ͍ɺ։ൃੜ࢈ੑΛՌͨͤΔ Platform ։ൃʹ஫ྗ͢Δ
  31. Who owns the Service Level? • Service Level ͸ϓϩμΫτʹؔΘΔશһͷ΋ͷ •

    શһ͕ؔ৺Λ࣋ͯΔΑ͏ͳ৴པੑࢦඪʹਐԽͤ͞·͠ΐ͏ • ϢʔβମݧΛ௚઀తʹද͢ Client-side(WebFrontend/Native) Ͱͷ SLI/SLO Λ௥͏ • ʮࢦඪΛݟͯߦಈ͢Δʯͦͷ΋ͷͷվળαΠΫϧΛճ͠·͠ΐ͏