Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How We Foster Reliability in Diversity
Search
Narimichi Takamura
May 14, 2022
Technology
14
13k
How We Foster Reliability in Diversity
SRE NEXT 2022 の基調講演の資料です。
https://sre-next.dev/2022/schedule#kc01
Narimichi Takamura
May 14, 2022
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
nari_ex
1
11k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
8.9k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.6k
Engineering with Business Impact
nari_ex
2
310
SRE Practices in Organizations
nari_ex
16
10k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
360
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
250
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
10k
エンジニアリング組織の基礎知識 / Basic knowledge of engineering organization
nari_ex
10
4.7k
Other Decks in Technology
See All in Technology
SwiftUIのGeometryReaderとScrollViewを基礎から応用まで学び直す:設計と活用事例
fumiyasac0921
0
140
Azure SynapseからAzure Databricksへ 移行してわかった新時代のコスト問題!?
databricksjapan
0
140
SREとソフトウェア開発者の合同チームはどのようにS3のコストを削減したか?
muziyoshiz
1
100
o11yで育てる、強い内製開発組織
_awache
3
120
pprof vs runtime/trace (FlightRecorder)
task4233
0
160
動画データのポテンシャルを引き出す! Databricks と AI活用への奮闘記(現在進行形)
databricksjapan
0
140
コンテキストエンジニアリングとは? 考え方と応用方法
findy_eventslides
4
890
BirdCLEF+2025 Noir 5位解法紹介
myso
0
190
ACA でMAGI システムを社内で展開しようとした話
mappie_kochi
1
250
Access-what? why and how, A11Y for All - Nordic.js 2025
gdomiciano
1
110
Green Tea Garbage Collector の今
zchee
PRO
2
390
VCC 2025 Write-up
bata_24
0
180
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
40
2.1k
For a Future-Friendly Web
brad_frost
180
9.9k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Git: the NoSQL Database
bkeepers
PRO
431
66k
How to train your dragon (web standard)
notwaldorf
96
6.3k
Scaling GitHub
holman
463
140k
Mobile First: as difficult as doing things right
swwweet
224
10k
Making Projects Easy
brettharned
119
6.4k
Embracing the Ebb and Flow
colly
88
4.8k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.6k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Transcript
None
2
3
4
5
6
Motivation • اۀͷઓུαʔϏεɺจԽͳͲͷݻ༗ͷίϯςΩετ͕͋Δ தͰɺͲͷΑ͏ʹ৫ʹ SRE Λ࣮ફ͢ΔͱΑ͍͔ • αʔϏεΛऔΓר͘ڥͷมԽʹରͯ͠ SRE ͳʹ͕Ͱ͖Δͷ
͔ => ৫ʹదͨ͠ SRE Λఆٛ͠ɺҭ͍ͯͯͨ͘ΊͷऔΓΈΛհ ͠ͳ͕Βɺଟ༷ੑͱͱʹ SRE ͕ͲͷΑ͏ʹาΜͰ͍͔͘ ʹ͍ͭ ͓ͯ͠·͢ɻ 7
8
9
ٻΊΒΕΔ SRE ͷଟ༷Խ 10
SRE ͕ଟ༷Խ͢Δཧ༝ • ৫͝ͱʹ৴པੑͷఆ͕ٛҟͳΔͨΊ • ৴པੑࣄۀઓུγεςϜಛੑͳͲ༷ʑͳཁૉ͕Өڹ͢Δ • αʔϏεΛऔΓרͯ͘͢ͷཁૉ͕ಉ͡ʹͳΔ͜ͱͳ͍ • ৴པੑͷܭଌํ๏ඪɺͦͷୡํ๏ʹ͍ͨΔશ͕ͯಉ
͡ʹͳΔ͜ͱͳ͍ • → SRE ΛҰ༷ʹ࣮ફ͢Δ͚ͩͰ͏·͍͔͘ͳ͍ 11
ྫ: ϓϩμΫτͷϑΣʔζʹΑΔ৴པੑͷҧ͍ 12
ྫ: γεςϜಛੑʹΑΔ৴པੑͷҧ͍ 13
it's a marathon, not a sprint1 1 With SRE, failing
to plan is planning to fail 14
Why is SRE a marathon? • ٕज़తͳ໘ͱਓؒతͳ໘ʢจԽϓϩηεͳͲʣͷ՝͕͍͘ ͭંΓॏͳ͍ͬͯΔ • νʔϜϦʔμʔܦӦΛר͖ࠐΉඞཁ͕͋Δ
• ex. Ϧιʔεʢਓ͓ۚʣΛ৴པੑ࣮ݱͷͨΊʹׂΓͯΔ • ex. ࣗ෦த৺తͳߟ͑ํΛ͠ɺSRE ͷจԽతݪଇΛҭΉ → SRE ৫ͷมֵ͏ͷͰ͋ΓɺҰேҰ༦Ͱ͠ಘͳ͍ 15
16
৫ʹదͨ͠ SRE ͷ࣮ફ ͲͷΑ͏ʹߦ͏ͱΑ͍͔ 17
SRE ͷ࣮ફʹ͓͚Δॏཁͳ5ͭͷεςοϓ23 1. ࣮ફʹඞཁͳใΛूΊΔ 2. খ࢝͘͞Ίͯɺ܁Γฦ͢ 3. νʔϜΛࢧԉ͢Δ 4. ֶΜͩ͜ͱΛεέʔϧ͢Δ
5. σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ => ͍͖ͳΓϓϥΫςΟε࣮ફΛ͢ΔͷͰͳ͘ɺঢ়گѲ͔Β͡ΊΔ => ࠷ऴతʹɺ৫͕ࣗతʹ SRE Λ࣮ફͰ͖Δঢ়ଶΛࢦ͢ 3 Four steps to jumpstarting your SRE practice 2 The Professional Scrum Product Owner Book 18
SLI/SLO ʹ͓͚Δεςοϓͷ࣮ફྫ εςοϓ ۩ମྫ ࣮ફʹඞཁͳใΛूΊΔ ৴པੑʹؔ͢Δ՝ΛѲɺࣄۀऀΤϯδχΞ Ϧϯά৫ͷϦʔμʔΛಛఆ͢Δ খ࢝͘͞Ίͯɺ܁Γฦ͢ ݫີͳ SLI/SLO
Λఆٛͤͣɺখ͞ͳαʔϏεɾνʔϜ Λରʹ͡ΊͯΈΔ νʔϜΛࢧԉ͢Δ Embbedded SRE Λ༻͍֤ͯνʔϜͰ SLI/SLO Λ࣮ફ͢ Δ ֶΜͩ͜ͱΛεέʔϧ͢Δ ษڧձͷ։࠵ɺSRE Ξϯόαμʔͷ໋ͳͲʹΑͬͯ ద༻ൣғΛ֦େ͢Δ σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ SLI/SLO ͷվળΛ௨ͯ͠ଌఆͷΧϧνϟʔΛৢ͢Δ 19
5ͭͷεςοϓʹΑΔ৫ͷมԽ • ʮঢ়گѲʯˠʮখ࣮͘͞ફʯˠʮࢧԉʯˠʮՁೝࣝΛΞο ϓσʔτʯˠʮ৫શମ͕ೳಈతʹ࣮ફʯͷྲྀΕ • → ৫શମͷߦಈ͚ͩͰͳ͘ɺՁ؍มԽ͍ͯ͠Δ • ࣮ફ͢ΔϓϥΫςΟεʹΑͬͯɺՁೝࣝͷΞοϓσʔτҎ ߱ͷఔ͕ԕ͘ײ͡Δ......
• → ߦಈݪཧͷཧΛͯΊΔ͜ͱͰ൫໘ཧ͕Ͱ͖ͦ͏ 20
৫ͷණࢁϞσϧ • ߦಈݪཧʹؔ͢ΔණࢁͷϝλϑΝʔ • ʹݟ͍͑ͯΔණΑΓԿഒେ͖͍ණ ͕ਫ໘Լʹଘࡏ͢Δ • Լ্ͷཁҼʹͳ͍ͬͯΔ • ex.
ߦಈΛܾΊΔͷϙϦγʔ • ex. ϙϦγʔΛܾΊΔͷՁ؍ • ্ม͑͘͢ɺԼม͑ʹ͍͘ • ৫มֵɺ্͔ΒͪΖΜɺԼ͔Βͷ Ξϓϩʔνॏཁ 21
SRE ͷϓϥΫςΟεͱණࢁϞσϧ 22
SRE ࣮ફʹ͓͚ΔΈ • SLI/SLO ͳͲͷϓϥΫςΟεͷ࣮ફɺLevel 1,2 Λ܁Γฦ͠มߋ͠ͳ͕ Βɺ࠷ऴతʹՁ؍ΛΞοϓσʔτ͍ͯ͘͠औΓΈͱ͍͑Δ • Level
1,2 ͷࣄྫ૿͖͍࣮͑ͯͯͯફʹऔΓֻ͔Γ͍͢ • ͔͠͠ɺ্هͷऔΓΈ͚ͩͰ Level 3 ͷมԽʹ౸ୡ͠ͳ͍͜ͱ͕͋Δ • ৗͷۀมΘΔ͕ɺࣗతͳ SRE ࣮ફʹڑ͕͋Δ • => Level 3 ʹରͯ͠ͳʹ͔Ξϓϩʔν͕Ͱ͖ͳ͍͔ 23
Level 3 ͷΞϓϩʔν => SRE ͷํੑɾՁ؍Λ໌֬ʹ͢Δ 24
ͲͷΑ͏ʹํੑɾՁ؍Λ໌֬ʹ͢Δ͔ • ϑϨʔϜϫʔΫϛογϣϯɾϏδϣϯɾόϦϡʔΛར༻ • খ͘͞͡ΊΔͨΊʹ SREs ͷΈΛର • CTO
EM ͳͲɺ৫ͷϦʔμʔΛר͖ࠐΜͰ࣮ࢪ 25
SRE ͷϏδϣϯɾϛογϣϯɾόϦϡʔΛఆٛ͢Δ εςοϓ ۩ମྫ ࣮ફʹඞཁͳใΛूΊΔ اۀɾ৫ͷίϯςΩετΛѲ͢Δ খ࢝͘͞Ίͯɺ܁Γฦ͢ MVV Λ࡞͠ɺSREs ʹߜͬͯӡ༻͠ͳ͕ΒվળΛੵ
ΈॏͶΔ νʔϜΛࢧԉ͢Δ Embbedded SRE Λ༻͍ͯɺ֤νʔϜͰ SRE ͷ࣮ફΛ͢ Δ ֶΜͩ͜ͱΛεέʔϧ͢Δ MVV Λϕʔεʹͨ͠Ձ؍ͷڞ༗ํੑͷ͢Γ߹Θ ͤΛߦ͏ σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ SRE ͷ࣮ફʹؔ͢Δ KPI ΛఆΊɺଌఆͷΧϧνϟʔΛ ৢ͢Δ 26
اۀɾ৫ίϯςΩετΛѲ͢Δ 27
લఏ: ձࣾͱ SREs ͷ MVV Λ߹ͤ͞Δ 28
ͲͷΑ͏ʹاۀɾ৫ίϯςΩετͷѲ͢Δ͔ • اۀ৫Λද͢มΛͯ͢ચ͍ग़ͨ͠Γɺܭଌ͢Δ͜ͱ ࠔ • → اۀ׆ಈΛ၆ᛌతʹଊ͑ͳ͕ΒɺSRE ʹӨڹͷ͋Δओཁͳม Λཧ͢Δ 29
اۀ׆ಈͷશମ૾ 30
اۀ׆ಈͷશମ૾ 31
اۀ׆ಈͷநԽ 32
৫ͷίϯςΩετʹ͓͚Δओཁͳม 1. اۀํʹؔ͢ΔίϯςΩετ 2. αʔϏεʹؔ͢ΔίϯςΩετ 3. ৫ʹؔ͢ΔίϯςΩετ 33
اۀํͷίϯςΩετͷѲ ֬ೝର ۩ମతͳΠϯϓοτྫ ؍ ձࣾͷํੑ ϛογϣϯɺϏδϣ ϯɺόϦϡʔʹؔ͢Δ ࢿྉ Ͳ͜ʹ͔͍ͬͯ͘ͷ ͔ɺͳʹΛେʹͯ͠
͍Δͷ͔ ࣄۀઓུ IRɺ͚ࣾͷΩοΫ Φϑࢿྉ ݱࡏͷձࣾͷϑΣʔ ζɺࣄۀͷઓུΛ ͑Δ 34
αʔϏεͷίϯςΩετͷѲ ֬ೝର ۩ମతͳΠϯϓοτྫ ؍ αʔϏε༰ αʔϏεઆ໌ࢿྉɺ࣮ࡍʹ ͬͯΈΔ ͲͷΑ͏ͳαʔϏεͳͷ͔ αʔϏεߏ γεςϜߏਤɺ։ൃϑϩ
ʔɺϦϦʔεϓϩηε ͲͷΑ͏ͳΈͳͷ͔ɺ ͲͷΑ͏ʹ։ൃ͞Ε͍ͯΔ ͷ͔ αʔϏε՝ Πϯγσϯτͷཤྺɺγε ςϜʹ͓͚Δॏཁͳ՝ ͳʹ͕ʹͳͷ͔ 35
৫ͷίϯςΩετͷѲ ֬ೝର ۩ମతͳΠϯϓοτྫ ؍ ৫ߏ ৫ਤɺνʔϜͷׂ આ໌ࢿྉ ͲͷΑ͏ͳνʔϜ͕͋ Δͷ͔ɺΩʔϚϯ୭ ͳͷ͔
৴པੑͷҙࣝ ϚΠϯυηοτΛௐࠪ ͢Δ ৴པੑ͕Ͳͷ͘Β͍׆ ಈʹӨڹ͕͋Δͷ͔ 36
ิ: ৫ͷ৴པੑͷϚΠϯυηοτ ৴པੑʹؔ͢Δ৫ͷϚΠϯυηοτ5ͭʹྨͰ͖Δ4 ϑΣʔζ આ໌ Absent ৫ʹͱͬͯ৴པੑޙճ͠ʹͳ͍ͬͯΔঢ়ଶ Reactive ۙͰੜͨ͡৴པੑͷͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜͷظత ͳࢿ΄΅ͳ͍
Proactive ఆظతͳ৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞Ε Δ Strategic ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫ ͷΫϥεཧ͢Δ Visionary ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑͷ෯͍औΓΈΛϕετϓϥ ΫςΟεΑͼܦݧʹج͍ͮͯࣾ֎ͰਪਐͰ͖Δ 4 What’s your org’s reliability mindset? Insights from Google SREs 37
SREs ͷ MVV Λ࡞͢Δ 38
MVV ࡞ͷεςοϓ 1. CTOɺEMɺSREs ͷώΞϦϯά • ແҙࣝతʹ༏ઌ͍ͯ͠Δ͜ͱ • ॏཁʹ͍ͯ͠ΔՁ؍ •
γεςϜϫʔΫϑϩʔͷϕετέʔεͷ͢Γ߹Θͤ 2. ͖ͨͨΛ࡞ • લஈͰ४උͨ͠ίϯςΩετͱώΞϦϯάΛͱʹ࡞ 3. ϨϏϡʔ & मਖ਼ • Google Docs Λར༻͠ɺSREνʔϜϨϏϡʔΛඇಉظͰߦ͏ • େ͕Ͱ͖ͨΒɺ͏ݴ༿ͷબఆχϡΞϯεͷ֬ೝΛಉظతʹߦ͏ 39
MVV ͷྫ 40
Χϧνϟʔͷৢʹؔ͢Δ KPI ΛఆΊΔ Ωʔϫʔυ KPI ͷྫ ૉૣ͘σϓϩΠ͢Δ CI/CD ࣌ؒͷܭଌ ҆શʹσϓϩΠ͢Δ
σϓϩΠޙʹमਖ਼͕ඞཁʹͳΔׂ߹ͷܭଌ Ϣʔβʔ͕҆৺ͯ͑͠Δ γεςϜʹ͓͚ΔηΩϡϦςΟج४ͷୡ ͷܭଌ • ։ൃपΓ Agile KPIs ͰௐΔͱݟ͕๛ʹग़ͯ͘Δ • ͦͷଞͷ׆ಈ KPI ͱΈ߹Θͤͯɺࣗࣾͷ "SRE KPIs" ΛఆٛͰ͖ΔͱΑ͍ 41
͜͜·Ͱͷ·ͱΊ • SRE ͷ࣮ફ 5 ͭͷεςοϓʹ͚Δ͜ͱ͕Ͱ͖Δ • ঢ়گѲ͔Β͡·Γɺ৫ͷϚΠϯυηοτͷมߋ·Ͱߦ͏ • 5
ͭͷεςοϓ৫มֵͷϓϩηεͳͷͰɺණࢁϞσϧΛద༻ͯ͠ཧͰ͖Δ • ණࢁϞσϧͷԼ͔Β্Ξϓϩʔν͢Δྫͱͯ͠ SRE ͷ MVV ߏஙΛհ • ํੑɾՁ؍ͷߏங 5 ͭͷεςοϓ༗ޮ • ၆ᛌతʹاۀ׆ಈΛଊ͑Δ͜ͱͰɺঢ়گѲͷΓޱΛಛఆ • ํੑɾՁ؍ͷৢʹ͓͍ͯɺσʔλυϦϒϯͷϚΠϯυηοτ༗ޮ 42
43
ڥͷมԽʹΑͬͯ దͳ৴པੑมΘ͍ͬͯ͘ • αʔϏεΛऔΓר͘ڥɺ࣌ؒܦաͱ ͱʹมԽ͠ଓ͚Δ • ಋೖͨ͠ͷɺڥͷมԽͱͱʹ࠷ దͰͳ͘ͳΔ • ڥͷมԽʹରͯ͠ɺͲͷΑ͏ʹཱͪ
͔͑Α͍ͷ͔ → มԽͷରԠ͢ΔͨΊͷΑ͍Ξϓϩʔν ͳ͍ͩΖ͏͔ 44
μΠφϛοΫέΠύϏϦςΟ • ͳʹ͕ى͜Δ͔Θ͔Βͳ͍࣌ͳͷͰԿ͕ى͖ͯରԠͰ͖ΔΑ͏ʹ͠Α͏ͱ͍͏ߟ͑ํ • ڥঢ়گ͕ܹ͘͠มԽ͢ΔதͰɺاۀ͕ͦͷมԽʹରԠͯࣗ͠ݾΛมֵ͢Δೳྗ5 • ҎԼͷ3ͭͷೳྗʹྨͰ͖Δ • ײʢSensingʣ: ڻҟةػΛײ͢Δೳྗ
• ัଊʢSeizingʣɿػձΛଊ͑ɺطଘͷࢿ࢈ɾࣝɾٕज़Λ࠶ߏͯ͠ڝ૪ྗΛ֫ಘ͢Δೳྗ • ม༰ʢTransformingʣɿڝ૪ྗΛ࣋ଓతͳͷʹ͢ΔͨΊʹɺ৫શମΛ৽͠ม༰ͤ͞Δ ೳྗ 5 ৽࣌ͷܦӦઓུʮμΠφϛοΫɾέΠύϏϦςΟʯͱԿ͔ʁ 45
3ͭͷೳྗSREͷεςοϓʹඥ͚Δ͜ͱ͕Ͱ͖Δ 1. ࣮ફʹඞཁͳใΛूΊΔ => Sensing 2. খ࢝͘͞Ίͯɺ܁Γฦ͢ => Seizing 3.
νʔϜΛࢧԉ͢Δ => Transforming 4. ֶΜͩ͜ͱΛεέʔϧ͢Δ => Transforming 5. σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ => Transforming 46
มԽʹΑͬͯੜͨ͡৽͍͠ ͚ࣗͩͰΧόʔ͖͠Εͳ͍ͷͰͳ͍͔ 47
18.3.5 νʔϜͷྗֶ ࢲͨͪɺSRE ͷιϑτΣΞ։ൃϓϩδΣΫτʹ ܞΘΔΤϯδχΞΛબ͢Δࡍʹɺ৽͍͠τϐοΫ ʹૉૣ͘ରԠ͍͚ͯ͠ΔθωϥϦετɺ๛ͳࣝ ͱܦݧΛ͍࣋ͬͯΔΤϯδχΞΛΈ߹Θͤͯ࠷ॳ ͷνʔϜΛ্ཱͪ͛Δ͜ͱͰɺେ͖ͳϝϦοτ͕ಘ ΒΕΔ͜ͱʹؾ͖ͮ·ͨ͠ɻ ܦݧͷଟ༷ੑʹΑͬͯΛͳ͘͢ͱڞʹɺͯ͢
ͷνʔϜͷϢʔεέʔε͕ࣗͷνʔϜͷϢʔε έʔεͱಉͩ͡ͱࢥ͍ࠐΜͰ͠·͏མͱ݀͠Λආ͚ Δ͜ͱ͕Ͱ͖·͢ɻ νʔϜʹͱͬͯɺඞཁͱ͞ΕΔεϖγϟϦετͱ ͷؔੑΛཱ֬͠ɺΤϯδχΞ͕৽͍͠ྖҬʹ ৺Α͘औΓΊΔΑ͏ʹ͢Δ͜ͱ͕ॏཁͰ͢ɻ 48
มԽͱଟ༷ੑ • 5ͭͷεςοϓΛ܁Γฦ͢͜ͱͰมԽʹରԠͰ͖Δ͕มֵͷ෯ʹݶք͕͋Δ • ৫Ͱଟ༷ੑΛ͛Δ͜ͱͰɺมԽͷੑΛڧԽͰ͖Δ • SRE ຊͰɺҟͳΔλΠϓͷΤϯδχΞΛΈ߹Θ͍ͤͯͨ • SRE
ίϛϡχςΟΛνʔϜͱଊ͑Δͱଟ༷ੑΛ͛ΔઈͷػձʹͳΔ • มԽʹର͢Δଟ༷ੑʹΑΔରࡦɺΞϯνϑϥδϟΠϧͷߟ͑ํͷҰ෦Ͱ͋Δ • ͞ΒʹରࡦΛߟ͑Δ߹ɺγεςϜੑͷΞϓϩʔν͕ࢀߟʹͳΓͦ͏ 49
·ͱΊ • SRE ࣮ફΛ͢Δ্Ͱॏཁͳ5ͭͷεςοϓΛհ • ৫Λมֵɺණࢁͷ্ͱԼͷ྆໘͔Βߦ͑ΔͱΑ͍ • SRE ͷํੑɾՁ؍Λ໌֬ʹ͢ΔऔΓΈΛհ •
ڥͷมԽͷੑΛอͭʹɺ৫ͷଟ༷ੑཱ͕ͭ 50