Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

開発者とともに作る Site Reliability Engineering / SREing...

Takeshi Kondo
September 29, 2023
7.6k

開発者とともに作る Site Reliability Engineering / SREing with Developers

Takeshi Kondo

September 29, 2023
Tweet

More Decks by Takeshi Kondo

Transcript

  1. Who am I chaspy chaspy_ Senior Engineering Manager (10/1~) StudySapuri

    K12 at Recruit Co., Ltd. SRE Team ͷϚωʔδϟͱͯ͠͸ࠓ೔͕࠷ऴ೔Ͱ͢👋 Takeshi Kondo https://chaspy.me
  2. ·ͱΊ • લఏ: SREing ͷ࣮ݱ = ։ൃऀ͕৴པੑΛίϯτϩʔϧͰ͖͍ͯΔ͜ͱ • SREs ͸

    ։ൃऀΛ Enabling / Platform ྆໘Ͱαϙʔτ͢Δ • ͦͷͨΊʹ SREs ͸։ൃऀͷཁٻΛਖ਼͘͠ཧղ͢Δඞཁ͕͋Δ • ࣄྫ঺հ: SRE ͕։ൃऀͷཁٻΛਖ਼͘͠ཧղ͢ΔͨΊͷΞϓϩʔν • 1. ϑΟʔυόοΫΛಘΔ • 2. ίϥϘϨʔγϣϯ͢Δ • 3. ࣮ࡍʹମݧ͢Δ • ͜ΕΒΛԼࢧ͑͢Δ૊৫จԽ • ৼΓฦΓ / ϑΟʔυόοΫ / ௅ઓ
  3. SRE NEXT 2020/2022 2020 / SLI/SLO ͱ͍͏ݴ༿ ͕ͳ͍ঢ়ଶ͔Βٕज़ɾจԽ ྆໘Ͱܒ໤ͨ͠ࣄྫ 2022

    / Error Budget ӡ༻ͷͨΊʹ ͸ͦͷͨΊͷ༧ࢉͱܭըΛߦ͏ٕज़ ઓུ͕ඞཁͰ͋Δͱ͍͏ߟ࡯
  4. SRE Team at StudySapuri K12 • 7໊Ͱߏ੒ • ର৅૊৫͸ʰελσΟαϓϦʱখதߴͱւ֎ Quipper

    • αϙʔτ͢Δ Developer ͸ TPM, Native Devs ؚΉͱ100໊Ҏ্ • ৴པੑͷͨΊʹ։ൃऀΛ Enabling ͢Δۀ຿ͱɺ։ൃੜ࢈ੑ ͷͨΊͷ Platform Λ࡞Δۀ຿྆ํΛߦ͏ • Team Topology ༻ޠͰ͢
  5. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ @chaspy ೖࣾޙ • 2018: @chaspy ೖࣾ • 2019:

    Application Platform Λ Kubernetes ΁Ҡ؅ • 2020: Microservices Readiness ͷ੔උ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ Platform Team ͱͯ͠ Platform Λ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ૊৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ੒
  6. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ 2021೥ • COVID-19 ྲྀߦɺΞΫηε਺૿େ • Platform ͷਐԽ •

    Terraform monorepo • Loadtest Platform • GitHub Actions ʹΑΔ monorepo CI ෼཭ • ૊৫ͷมԽ • ٕज़ઓུάϧʔϓൃ଍ • ࣄۀҠ؅ʹΑΓϦΫϧʔτ΁స੶ɺQuipper ೔ຊࢧళਗ਼ࢉ • chaspy EM ೚༻
  7. SRE νʔϜ͕໨ࢦ͍ͯͨ͜͠ͱ • ՝୊: SRE νʔϜ͕΍Δ͜ͱ͸ SRE νʔϜ͕ܾΊ͍ͯͨ • ద੾ͳ՝୊ઃఆ͕Ͱ͖͍ͯͳ͍Մೳੑ͕͋ͬͨ

    • ࠶ݱੑʹ͚͍ܽͯͨ / ଐਓੑ͕͋ͬͨ • ։ൃऀ͔ΒϑΟʔυόοΫΛಘΔ͜ͱʹ஫ྗ • ෳ਺ͷΞϓϩʔνΛࢼͨ͠ • ׆ಈΛਐΊΔ͏ͪʹɺ3ͭͷύλʔϯ͕͋Δ͜ͱʹؾ͍ͮͨ
  8. SRE αʔϕΠͷ࣮ࢪ • Google form Ͱ࣮ࢪ / ൒೥ʹ1౓ / ղ౴཰͸60%ఔ౓

    • ࣭໰಺༰ • νʔϜ͝ͱͷࣗݾ׬݁౓ / πʔϧ΍ SaaS ͷशख़౓ • Կ͕Ͱ͖͍ͯΕ͹αʔϏεΛࣗ෼ͨͪͰӡ༻Ͱ͖͍ͯΔɺͱݴ͑·͔͢ʁ • ΑΓϓϩμΫτΛྑ͍ͯͨ͘͘͠Ίʹ͸ɺԿ͕ϘτϧωοΫͰ͔͢ʁ • ΋͠ɺ࠷ۙ௚໘ͨ͠SREʹؔ࿈ͦ͠͏ͳ՝୊/໰୊͕͋Ε͹ڭ͍͑ͯͩ͘͞ • ීஈͷ։ൃͷதͰ࠷΋ετϨεΛײ͍ͯ͡Δ޻ఔɾखॱ͸ԿͰ͔͢ʁ • ։ൃͷ޻ఔͷதͰɺ͜Ε͕଎͘ͳͬͨΒخ͍͠ͱ͍͏෦෼͸͋Γ·͔͢ʁ • ࣗ෼ͨͪͷαʔϏεʹ͓͍ͯɺෛ࠴ͩͱײ͍ͯ͡Δ / ফ͠ڈΓ͍ͨཁૉ͸Կ͔͋Γ·͔͢? • ݱࡏͷ(SRE͕։ൃ/ӡ༻͍ͯ͠Δ)ϓϥοτϑΥʔϜͰ෼͔Βͳ͍͜ͱ͕͋ͬͨΒɺԿΛݟ·͔͢? • SREͷରԠ΍औΓ૊Έʹ͍ͭͯͷཁ๬/ίϝϯτ͕͋Ε͹͓ئ͍͠·͢ • SREͷ໰͍߹ΘͤରԠʹ͍ͭͯཁ๬/ίϝϯτ͕͋Ε͹͓ئ͍͠·͢ • SREνʔϜ͔Βґཔ͕͋Δͱ͖ɺͲͷΑ͏ͳܗͰ࿈བྷ/ґཔ͞ΕΔͱྑ͍Ͱ͔͢?
  9. SRE αʔϕΠͷ࣮ࢪ • Google form Ͱ࣮ࢪ / ൒೥ʹ1౓ • ࣭໰಺༰

    • νʔϜ͝ͱͷࣗݾ׬݁౓ / πʔϧ΍ SaaS ͷशख़౓ • Կ͕Ͱ͖͍ͯΕ͹αʔϏεΛࣗ෼ͨͪͰӡ༻Ͱ͖͍ͯΔɺͱݴ͑·͔͢ʁ • ΑΓϓϩμΫτΛྑ͍ͯͨ͘͘͠Ίʹ͸ɺԿ͕ϘτϧωοΫͰ͔͢ʁ • ΋͠ɺ࠷ۙ௚໘ͨ͠SREʹؔ࿈ͦ͠͏ͳ՝୊/໰୊͕͋Ε͹ڭ͍͑ͯͩ͘͞ • ීஈͷ։ൃͷதͰ࠷΋ετϨεΛײ͍ͯ͡Δ޻ఔɾखॱ͸ԿͰ͔͢ʁ • ։ൃͷ޻ఔͷதͰɺ͜Ε͕଎͘ͳͬͨΒخ͍͠ͱ͍͏෦෼͸͋Γ·͔͢ʁ • ࣗ෼ͨͪͷαʔϏεʹ͓͍ͯɺෛ࠴ͩͱײ͍ͯ͡Δ / ফ͠ڈΓ͍ͨཁૉ͸Կ͔͋Γ·͔͢? • ݱࡏͷ(SRE͕։ൃ/ӡ༻͍ͯ͠Δ)ϓϥοτϑΥʔϜͰ෼͔Βͳ͍͜ͱ͕͋ͬͨΒɺԿΛݟ·͔͢? • SREͷରԠ΍औΓ૊Έʹ͍ͭͯͷཁ๬/ίϝϯτ͕͋Ε͹͓ئ͍͠·͢ • SREͷ໰͍߹ΘͤରԠʹ͍ͭͯཁ๬/ίϝϯτ͕͋Ε͹͓ئ͍͠·͢ • SREνʔϜ͔Βґཔ͕͋Δͱ͖ɺͲͷΑ͏ͳܗͰ࿈བྷ/ґཔ͞ΕΔͱྑ͍Ͱ͔͢? शख़౓ΛνʔϜผʹ ఺਺Ͱूܭ ։ൃੜ࢈ੑΛ્֐͢Δ ཁҼΛώΞϦϯά SRE ͷ೔ࠒͷৼΔ෣͍ʹ͍ͭͯ ϑΟʔυόοΫΛಘΔ
  10. νʔϜ͝ͱͷࣗݾ׬݁౓ / πʔϧ΍ SaaS ͷशख़౓ 0 - શ͘Θ͔Βͳ͍ 1 -

    ஌ͬͯ͸͍Δ 2- ͳΜͱ͔࢖͏͜ͱ͕Ͱ͖Δ 3- ࢖͏͜ͱ͕Ͱ͖Δ 4- มߋ͢Δ͜ͱ͕Ͱ͖Δ 5- ׬શʹཧղ͍ͯ͠Δ
  11. νʔϜ͝ͱͷࣗݾ׬݁౓ / πʔϧ΍ SaaS ͷशख़౓ 0 - શ͘Θ͔Βͳ͍ 1 -

    ஌ͬͯ͸͍Δ 2- ͳΜͱ͔࢖͏͜ͱ͕Ͱ͖Δ 3- ࢖͏͜ͱ͕Ͱ͖Δ 4- มߋ͢Δ͜ͱ͕Ͱ͖Δ 5- ׬શʹཧղ͍ͯ͠Δ
  12. ৽ػೳϦϦʔεલͷσϓϩΠύΠϓϥΠϯͷߏங • 2022೥ʰελσΟαϓϦதֶߨ࠲ʱϑϧϦχϡʔΞϧ • ֶशମݧΛϑϧεΫϥονͰ࡞Γ௚ͨ͠ • Repository ΋৽نͰ࡞ΓɺCI/CD ΋ 0

    ͔Β࡞ͬͨ • SRE / ։ൃऀͰཁ݅ఆٛͱ࣮૷Λڞʹߦͬͨ • Branch strategy • Fast-forward merge Λ͍ͨ͠ • ౰࣌͸ϝΠϯͷϓϩμΫτ͸ CD ͕ GitOps(ArgoCD)͓ͯ͠Βͣɺ SRE ͱͯ͠΋ྑ͍࣮ݧతϓϩμΫτͱͳͬͨ
  13. 2. ίϥϘϨʔγϣϯ͢Δ • 1. ৽ػೳϦϦʔεલͷσϓϩΠύΠϓϥΠϯͷߏங • 2. ٕज़ઓུάϧʔϓͰͷٞ࿦ • 3.

    ڞ௨ͷ՝୊ΛϖΞϓϩɾϞϒϓϩͰਐΊΔ • CNDF2023 ࢿྉΛࢀর 👉 https://speakerdeck.com/chaspy/toward-the-acquisition-of-self-diagnostic-skills
  14. 3. ࣮ࡍʹମݧ͢Δ • 1. ։ൃνʔϜ͔Β SRE νʔϜ΁ͷ୹ظཹֶ • 2. SRE

    νʔϜ͔Β։ൃνʔϜ΁ͷ୹ظཹֶ • 3. SRE Ϛωʔδϟ͕։ൃνʔϜͷϚωδϝϯτ΋݉೚ 43& ։ൃνʔϜ" ։ൃνʔϜ#    ։ൃνʔϜ$
  15. 1. ։ൃνʔϜ͔Β SRE νʔϜ΁ͷ୹ظཹֶ • ΞϥʔτϋϯυϦϯάͳͲ SRE ͷීஈͷۀ຿΋࣮ࢪ • Kubernetes

    Cluster ͷ Upgrade Λ׬਱(DeveloperͰ্࢙ॳ) • ཹֶऴྃޙ΋ڞ௨ͷϓϩδΣΫτΛϞϒϓϩͰ࣮ࢪ͢ΔͳͲ ؔ܎͸ܧଓ͍ͯ͠Δ • e.g. ઌ΄Ͳ঺հͨ͠ Envoy ഇࢭͷ݅΋͜ͷϝϯόʔͱ SRE Ͱߦͬͨ
  16. 3. SRE Ϛωʔδϟ͕։ൃνʔϜͷϚωδϝϯτ΋݉೚ • ࣗ෼ͷ࿩Ͱ͢ • Ref: 2022-10-02 Web Application

    ։ൃͷ EM Λ݉຿͢Δ͜ͱʹͳͬͨ • ݉຿͕͍͍͔Ͳ͏͔͸ผͷ࿩…(͓͢͢Ί͸͠ͳ͍) • ։ൃνʔϜͱ SRE νʔϜ྆ํݟΔ͜ͱͰɺ։ൃνʔϜ͕Ͳ͜ ʹࠔΔͷ͔͕Θ͔ΔΑ͏ʹͳͬͨ • ࣮ࡍʹࣗ෼͕σϓϩΠ͢Δ͜ͱͰ՝୊Λײͨ͡ • ։ൃऀ͕ීஈͲ͏΍ͬͯΤϥʔΛ࡯஌͍ͯ͠Δͷ͔Λ஌ͬͨ
  17. Ϛωδϝϯτͷ޻෉ • ૂ͍ͬͯͨ͜ͱɾҙ͍ࣝͯͨ͜͠ͱ • ઓུతʹ։ൃऀ͔ΒϑΟʔυόοΫΛಘΔ͜ͱʹ஫ྗͨ͜͠ͱ • ։ൃऀͱ SRE ͷίϥϘϨʔγϣϯΛҙਤతʹ૊Μͩ͜ͱ •

    νʔϜؒͷ୹ظཹֶΛߦ͑ΔΑ͏ʹௐ੔ͨ͜͠ͱ • νʔϜΛ௒͑ͯ՝୊Λ࿩ͤΔ৔ΛσβΠϯͨ͜͠ͱ(ٕज़ઓུ) • ͕ɺ͍ͨͨ͜͠ͱ͸ͯ͠ͳ͍ͳ…ͱ͍͏࣮ײ • Ϛωδϝϯτͱ͸ͦ͏͍͏΋ΜͰ͋Δ͔΋͠Εͳ͍
  18. ৼΓฦΔจԽ • ৼΓฦΓͷจԽ • Ref: SRE νʔϜΛࢧ͑Δ;Γ͔͑ΓͷจԽ • ΄΅શͯͷνʔϜͰ2िʹ1౓ৼΓฦΓ͕ߦΘΕ͍ͯΔ •

    ϓϩδΣΫτ͕ऴΘͬͨΓɺো֐ൃੜޙ΋ඞͣৼΓฦΓ͕ߦΘΕΔ • ৼΓฦΔ͔Βͦ͜ɺ՝୊Λݟ͚ͭΒΕΔ • ݟ͚ͭͨ՝୊΁ͷΞΫγϣϯͱ͍ͯ͠ΖΜͳ৔΁ͷ࣋ͪࠐΈʹܨ͕Δ • Πϯϑϥαϙʔτͷґཔ΋͓ͦΒͦ͘ͷΑ͏ͳձ࿩͕͋ͬͨͷͰ͸ͳ͍͔
  19. ϑΟʔυόοΫ͢ΔจԽ - ਖ਼͍͠૬खʹɺਖ਼͘͠ϑΟʔυόοΫ͢Δ • ೔ࠒ͔ΒνʔϜɾάϧʔϓɾ෦໳Λ௒͑ͯϑΟʔυόοΫΛ͢Δ จԽ͕͋Δ • શ෦͕ղܾͰ͖ΔΘ͚Ͱ͸ͳ͍͕ɺগͳ͘ͱ΋޲͖߹ΘΕΔ • ໨҆ശͷΑ͏ͳ࢓૊ΈΛ࣋ͭάϧʔϓ͕͋Ε͹ɺslack

    ελϯϓͰ՝୊Λ౤ ͛ࠐΊΔ࢓૊Έ(reacjiΛΩϟονͯ͠ϋϯυϦϯά)͕͋ͬͨΓ͢Δ • Blameless ʹϑΟʔυόοΫ͍ͯ͠Δ • ϦΫϧʔτશମͷԣஅ૊৫ʹରͯ͠΋ɺվળཁ๬ͳͲΑ͘ૹͬͯ ͍Δʢਓࣄɺ࿑຿ɺICTɺηΩϡϦςΟਪਐ etcʣ
  20. ·ͱΊ • લఏ: SREing ͷ࣮ݱ = ։ൃऀ͕৴པੑΛίϯτϩʔϧͰ͖͍ͯΔ͜ͱ • SREs ͸

    ։ൃऀΛ Enabling / Platform ྆໘Ͱαϙʔτ͢Δ • ͦͷͨΊʹ SREs ͸։ൃऀͷཁٻΛਖ਼͘͠ཧղ͢Δඞཁ͕͋Δ • ࣄྫ঺հ: SRE ͕։ൃऀͷཁٻΛਖ਼͘͠ཧղ͢ΔͨΊͷΞϓϩʔν • 1. ϑΟʔυόοΫΛಘΔ • 2. ίϥϘϨʔγϣϯ͢Δ • 3. ࣮ࡍʹମݧ͢Δ • ͜ΕΒΛԼࢧ͑͢Δ૊৫จԽ • ৼΓฦΓ / ϑΟʔυόοΫ / ௅ઓ
  21. ·ͱΊ • ࠓ೔࿩ͨ͜͠ͱ • SRE ͱ։ൃऀͰ”ͱ΋ʹ” SREing Λ࣮ݱ͢ΔΞϓϩʔνͷ۩ମྫΛ঺հ͠·ͨ͠ • ͦͷഎܠͱͳΔ૊৫จԽʹ͍ͭͯߟ࡯͠·ͨ͠

    • ߟ࡯ • SREing Λ࣮ݱ͢Δʹ͸ɺಛఆͷख๏Λ΍Δ͚ͩͰͳ͘ɺԼࢧ͑͢ΔจԽৢ੒΋ηοτͰ ߟ͑Δඞཁ͕͋Γ·͢ • ʰελσΟαϓϦʱ։ൃ૊৫ʹݩʑ͋ͬͨจԽతૉཆʹॿ͚ΒΕͨ఺΋େ͖͍ͱࢥ͍·͢ • Έͳ͞Μͷ૊৫ʹ͸ͲΜͳจԽ͕͋Δ͔ɺͦΕ͕ SREing ࣮ݱʹͲ͏ؔ܎͢Δ͔Λߟ͑ͯ Έͯ͸͍͔͕Ͱ͠ΐ͏͔
  22. ࠷ޙʹ • SRE ͱͯ͠աͨ͝͠5೥ؒɺ๻ͷΩϟϦΞ͸ SRE Lounge / SRE NEXT ͱͱ΋ʹ͋Γ·ͨ͠

    • ࠷ޙʹൃදͰ͖ͯخ͍͠Ͱ͢ • ӡӦͷΈͳ͞Μ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ • ίϛϡχςΟʹԸฦ͠Ͱ͖͍ͯͨΒ޾͍Ͱ͢ • ࠙਌ձͰͨ͘͞Μ࿩͠·͠ΐ͏ʂ
  23. Thank you! chaspy chaspy_ Takeshi Kondo https://chaspy.me Senior Engineering Manager

    (10/1~) StudySapuri K12 at Recruit Co., Ltd. ͜Εʹͯ SRE ҾୀͰ͢ʂ࠙਌ձͰձ͍·͠ΐ͏🍻