Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
『スタディサプリ』における SLI/SLO の継続的改善 / Continuous impro...
Search
Takeshi Kondo
May 16, 2023
Technology
1
3.3k
『スタディサプリ』における SLI/SLO の継続的改善 / Continuous improvement of SLI/SLO at StudySapuri
https://connpass.com/event/282120/
Takeshi Kondo
May 16, 2023
Tweet
Share
More Decks by Takeshi Kondo
See All by Takeshi Kondo
Slack Platform(Deno) での RAG 実装 - LangChain(js) を使ってみた / rag-implementation-on-slack-platform-deno-experimenting-with-langchain-js
chaspy
0
170
SRE の考えをマネジメントに活かす / applying SRE ideas to management
chaspy
7
6.6k
RAGの簡易評価によるフィードバックサイクル実践 / Feedback cycle practice through simplified assessment of RAGs
chaspy
2
5.1k
定量データと定性評価を用いた技術戦略の組織的実践 / Systematic implementation of technology strategies using quantitative data and qualitative evaluation
chaspy
9
1.7k
エンジニアブランディングチームの KPI / KPI's of engineer branding team
chaspy
2
2k
「SLO Review」今やるならこうする / If I had to do the "SLO Review" again
chaspy
3
1.8k
開発者とともに作る Site Reliability Engineering / SREing with Developers
chaspy
10
8k
自己診断能力の獲得を目指して / Toward the acquisition of self-diagnostic skills
chaspy
1
4.8k
『スタディサプリ 中学講座』における E2E Test の運用と計測による改善 / Improved E2E testing through measurement
chaspy
0
4.5k
Other Decks in Technology
See All in Technology
コード品質向上で得られる効果と実践的取り組み
ham0215
2
200
AI・LLM事業部のSREとタスクの自動運転
shinyorke
PRO
0
300
サーバシステムを無理なくコンテナ移行する際に伝えたい4つのポイント/Container_Happy_Migration_Method
ozawa
1
100
グループポリシー再確認
murachiakira
0
170
職種に名前が付く、ということ/The fact that a job title has a name
bitkey
1
240
頻繁リリース × 高品質 = 無理ゲー? いや、できます!/20250306 Shoki Hyo
shift_evolve
0
150
Go製のマイグレーションツールの git-schemalex の紹介と運用方法
shinnosuke_kishida
1
410
LINEギフトのLINEミニアプリアクセシビリティ改善事例
lycorptech_jp
PRO
0
240
LINE Notify互換のボットを作った話
kenichirokimura
0
180
ルートユーザーの活用と管理を徹底的に深掘る
yuobayashi
6
720
Amazon EKS Auto ModeでKubernetesの運用をシンプルにする
sshota0809
0
110
問題解決に役立つ数理工学
recruitengineers
PRO
7
2.1k
Featured
See All Featured
The Invisible Side of Design
smashingmag
299
50k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7.1k
Statistics for Hackers
jakevdp
798
220k
The Cult of Friendly URLs
andyhume
78
6.3k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
51
2.4k
Designing Experiences People Love
moore
141
23k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.5k
Site-Speed That Sticks
csswizardry
4
450
Building a Modern Day E-commerce SEO Strategy
aleyda
39
7.2k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.4k
Unsuck your backbone
ammeep
670
57k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.6k
Transcript
ʰελσΟαϓϦʱʹ͓͚Δ SLI/SLO ͷܧଓతվળ Takeshi Kondo / @chaspy 2023/05/13 SLOconf Tokyo
2023
Who am I chaspy chaspy_ Engineering Manager Site Reliability and
Web Application Development at Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
SRE NEXT 2020 & 2022 • 2020 • SLI/SLO ͱ͍͏ݴ༿͕ͳ͍ঢ়ଶͰ৫
ಋೖΛࢼΈͨࣄྫ • 2022 • SLI/SLO Λಋೖͨ͠ޙͷ • ৫શମͰ Site Reliability Engineering ΛਐΊΔͨΊʹඞཁͳ͜ͱΛߟ͑ͨ
SRE & Web Application Development 2018 2020 2021 2023 2019
2022 2VJQQFS ೖࣾ 43&/&95 4-0Λ৫ʹಋೖ ͠Α͏ͱؤுΔ &OHJOFFSJOH.BOBHFSͱͯ͠ 8FC։ൃνʔϜʹࢀՃ 43&/&95 &OHJOFFSJOH .BOBHFSʹͳΔ 4-0DPOG 5PLZP✨
SRE & Web Application Development 2018 2020 2021 2023 2019
2022 2VJQQFS ೖࣾ 43&/&95 4-0Λ৫ʹಋೖ ͠Α͏ͱؤுΔ 43&/&95 &OHJOFFSJOH .BOBHFSʹͳΔ &OHJOFFSJOH.BOBHFSͱͯ͠ 8FC։ൃνʔϜʹࢀՃ ࠓ։ൃऀઢͰ͠·͢ʂ 4-0DPOG 5PLZP✨
ࠓ͍͑ͨ͜ͱ ҰܾΊͨ SLI/SLO ܧଓతʹݟ͠·͠ΐ͏
Ұઃఆͯ͠ݟ͞ͳ͔ͬͨΒͲ͏ͳ͔ͬͨͷΛ͠·͢😅
ʰελσΟαϓϦʱʹ͓͚Δ SLI/SLO ͷܧଓతվળ Λ͜Ε͔Β͍ͬͯͧ͘ͱ͍͏ Takeshi Kondo / @chaspy 2023/05/13 SLOconf
Tokyo 2023
Outline • ࣗݾհ • ʰελσΟαϓϦ தֶߨ࠲ʱʹ͍ͭͯ • SLI/SLO ͳΜͷͨΊʹ͋Δͷ͔ •
αʔϏεӡ༻ͷݱঢ়ͱ՝ • ՝ʹ࣮͋ͨͬͯࡍʹऔΓΜͩ͜ͱ • ·ͱΊ
Outline • ࣗݾհ • ʰελσΟαϓϦ தֶߨ࠲ʱʹ͍ͭͯ • SLI/SLO ͳΜͷͨΊʹ͋Δͷ͔ •
αʔϏεӡ༻ͷݱঢ়ͱ՝ • ՝ʹ࣮͋ͨͬͯࡍʹऔΓΜͩ͜ͱ • ·ͱΊ
લఏɿϓϩμΫτհ - ελσΟαϓϦ
None
20222݄ʹϦχϡʔΞϧ • Ϣʔβج൫Ҏ֎ͷ෦Λ৽نϚΠΫϩ αʔϏεͱͯ͠2ʹΓ։ൃ • ϦϦʔε͔Β1ܦաɻݱࡏܧଓత ʹΤϯϋϯε͍ͯ͠·͢ https://www.recruit.co.jp/newsroom/pressrelease/2022/0131_9881.html ϦχϡʔΞϧͷϙΠϯτʂ
ࠓिͷϛογϣϯͱ෮ԋशػೳʹΑΔݸผֶशࢧԉ ԋशྔɾқΛେ෯֦ॆ ʮఆظςετରࡦߨ࠲ʯΛؚΉ৽ߨ࠲͕ଓʑొ ֶशը໘ͷσβΠϯΛҰ৽
උߟ: tara ͱ͍͏ͷ͜ͷϦχϡʔΞϧϓϩδΣΫτͷίʔυωʔϜͰɺ࠷ۙΠϯλϏϡʔͰύϒϦοΫʹͳͬͨ https://brand.studysapuri.jp/career/interview/article/Saori_Suzuki/ ݩʑ͋ͬͨ Ϣʔβج൫Λ ؚΉαʔϏε
Outline • ࣗݾհ • ʰελσΟαϓϦ தֶߨ࠲ʱʹ͍ͭͯ • SLI/SLO ͳΜͷͨΊʹ͋Δͷ͔ •
αʔϏεӡ༻ͷݱঢ়ͱ՝ • ՝ʹ࣮͋ͨͬͯࡍʹऔΓΜͩ͜ͱ • ·ͱΊ
Why SLI/SLO? • ػೳ։ൃorඇػೳ։ൃɺͲͪΒʹ࣌ؒΛ͏ͷ͔Λ Fact-BasedͰܾఆ͢ΔͨΊ • Error Budget ͕͋ Δ͏ͪ1ͭ1ͭͷ
Τϥʔʹରॲ͠ͳ͍ • Burn Out Λආ͚ΒΕΔ • Budget ͕͋Δ͏ͪϦε Ϋ͕ͱΕΔ
Outline • ࣗݾհ • ʰελσΟαϓϦ தֶߨ࠲ʱʹ͍ͭͯ • SLI/SLO ͳΜͷͨΊʹ͋Δͷ͔ •
αʔϏεӡ༻ͷݱঢ়ͱ՝ • ՝ʹ࣮͋ͨͬͯࡍʹऔΓΜͩ͜ͱ • ·ͱΊ
උߟ: tara ͱ͍͏ͷ͜ͷϦχϡʔΞϧϓϩδΣΫτͷίʔυωʔϜͰɺ࠷ۙΠϯλϏϡʔͰύϒϦοΫʹͳͬͨ https://brand.studysapuri.jp/career/interview/article/Saori_Suzuki/ ݩʑ͋ͬͨ Ϣʔβج൫Λ ؚΉαʔϏε 3FWFSTF1SPYZ /HJOY •
SLI/SLO શ෦Ͱ8ͭ • (a)Availability ͱ (b)Latency • http ͷ metrics Λ͏ • ҎԼͷ4Օॴʹ(a/b)2छྨͣͭ • ᶃ api-gateway • ᶄ api-gateway -> main • ᶅ api-gawatey -> content • ᶆ main -> content • SLO • Availability: 99.9% • Latency: 95 percentile < 1000msec ᶃ ᶄ ᶅ ᶆ
Why Envoy? • ࣌ϚΠΫϩαʔϏεؒͷ metrics Λऔಘ͢Δํ๏͕ ͳ͔ͬͨ • Control Plane
ΛؚΜͩ Service Mesh Ͱͳ͘ɺSide- car container ͱͯ͠୯ʹૉͷ Envoy ΛࡌͤΔͷΈ
DevSupport: ସΘΓ൪Ͱఆৗӡ༻ۀΛߦ͏ • Slack ͷ௨Λ֬ೝͯ͠ݪҼௐࠪ • Sentry Exception, SLO Alert,
GCP Pub/Sub Dead Letter • खಈରԠ͕ඞཁͳͷ֤νʔϜʹΤεΧϨʔγϣϯ • CS(Customer Support)͍߹ΘͤͷҰ࣍ड͚ • શମ͚ϝϯγϣϯͷ1࣍ड͚
ى͖͍ͯͨ՝: No SLO Alert • ϦϦʔε͔Βࠓ·ͰҰ SLO Alert ͕໐ͬͨ͜ͱͳ͍ •
Sentry ͷ Exception ྔ͕ SLI ʹө͞Ε͍ͯͳ͍ؾ͕͢Δ • Կ͕ى͖͍ͯΔͷͩΖ͏͔ʁ • গͳ͘ͱ Sentry Exception Λ1݅ͣͭݟ͍ͯΔ࣌Ͱ Error Budget ͱ͍͏֓೦ ར༻Ͱ͖ͯͳ͍ • SLO ͕ࣗͨͪͷظΑΓ؇͗͢Δʁ • SLI ͷઃఆ͕ޡ͍ͬͯΔʁ • ௐࠪͨ͠
Outline • ࣗݾհ • ʰελσΟαϓϦ தֶߨ࠲ʱʹ͍ͭͯ • SLI/SLO ͳΜͷͨΊʹ͋Δͷ͔ •
αʔϏεӡ༻ͷݱঢ়ͱ՝ • ՝ʹ࣮͋ͨͬͯࡍʹऔΓΜͩ͜ͱ • ·ͱΊ
Ծઆ: Envoy ͷ metrics (SLIᶄᶅᶆ) ͕͓͔͍͠ͷͰʁ • Yes • Exception
ͷҰ෦ DNS ໊લղܾͰࣦഊ͍ͯͨ͠ • ͭ·Γɺhttp request ʹࢸ͍ͬͯͳ͍ • envoy.cluster.upstream_rq_2xx ʹܭ্͞Εͳ͍ͷͦΕͦ͏ • ᶄͷ௨৴࣌ɺ໊લղܾʹࣦഊύλʔϯ • ᶃ ͷ SLI Ͱܭଌ͞Ε͍ͯΕྑ͍͕…? ௨ৗͷ௨৴ UBSBBQJHBUFXBZDPOUBJOFS͕IUUQ UBSBNBJOΛ໊લղܾ͢Δ͕͜͜ ࣦഊͨ͠ IUUQUBSBNBJOͰ௨৴͢Δ
Ծઆ2: Reverse Proxy ͷ metrics (SLIᶃ) ͕͓͔͍͠ͷͰʁ • Yes •
GraphQL ϦΫΤετ్͕தͰࣦഊͨ͠߹ɺhttp Ͱ 200 Λฦ͍ͯͨ͠😱 • ϦϦʔε࣌ɺ෦ࣦഊ 500 Ͱฦ͢͜ͱΛܾΊ͕ͨɺͦ͏͞Ε͍ͯͳ͔ͬͨ ௨ৗͷ௨৴ $MJFOU͔ΒIUUQTKVOJPSMFBSOTUVEZTBQVSJKQʹΞΫηε͢Δͱ 3FWFSTF1SPYZʹ౸ୡ 3FWFSTF1SPYZ͔ΒțBSBBQJHBUFXBZQSPYZᶄ UBSBBQJHBUFXBZ͔ΒțBSBNBJO௨৴ᶄ͜͜ͰΤϥʔ͕ൃੜ
ରॲ1ɿGraphQL Error ͷ߹ http 500 Λฦ͢ • ݩʑ GraphQL
http ͷ͜ͱΛؾʹ͍ͯ͠ͳ͍ • ڍಈ GraphQL server library ͷڍಈʹґଘ͢Δ • Response status 200 ʹ౷Ұ͢ΔϓϥΫςΟε͋Δ • Client Error Response ͷ errors ΛݟΔͷͰͳ͍ ಉ྅͕γϡοͱͯ͘͠Ε·ͨ͠🙏 4QFDJBM5IBOLT!2VSBNZ
ରॲ2ɿ Envoy ΛΊͯ Datadog APM metrics Λར༻ • ෳࡶੑʹΑΔτϥϒϧγϡʔτͷ͠͞ΛݮΒͨ͢Ί •
Envoy ͷ metrics ʹ͕͋ͬͨΘ͚Ͱͳ͍ • ӡ༻ͷ՝ଟ͘ metrics औಘҎ֎ͷϝϦοτಘΒΕ͍ͯͳ͔ͬͨ • Curcuit Breaker ೖΕ͍ͯͨͷͷൃಈͨ͠έʔε΄ͱΜͲͳ͍ • Envoy ͷ version up ରԠʢग़དྷ͍ͯͳ͍ʣ • Pod side-car container ͷىಈɾऴྃॱ੍ޚʢenvoy Λͨͳ͍ͱΤϥʔʹͳΔʣ • Rollouts Λ͍ͬͯΔ߹ͷ Patch ํ๏ʢResource ٯసͯ͠োʹͳͬͨ͜ͱʣ
খωλ: Datadog APM ݁ߏบ͕͋Δ(1) • http client ͷ APM Plugin
ͷ resource tag default Ͱ http method Ͱ͋Δ • Ѽઌ͝ͱͷ SLI ͱͯ͠࠾༻͢Δʹ hostname ͕ඞཁ • Node, Ruby ͰͦΕͧΕରԠ • ৫Ͱ http-client ͷ resource tag ͷ໋໊نΛ߹ҙ
খωλ: Datadog APM ݁ߏบ͕͋Δ(2) • trace.http.request.errors Ͱ http 5xx ֘͠ͳ͍
• ٯʹ 4xx ֘͢Δ • trace.http.request.hits.by_http_status Λར༻͢Δඞཁ͕͋Δ
උߟ: tara ͱ͍͏ͷ͜ͷϦχϡʔΞϧϓϩδΣΫτͷίʔυωʔϜͰɺ࠷ۙΠϯλϏϡʔͰύϒϦοΫʹͳͬͨ https://brand.studysapuri.jp/career/interview/article/Saori_Suzuki/ ݩʑ͋ͬͨ Ϣʔβج൫Λ ؚΉαʔϏε 3FWFSTF1SPYZ /HJOY •
SLO Λݟͨ͠ • (a)Availability ͱ (b)Latency • http ͷ metrics Λ͏ • ҎԼͷ4Օॴʹ(a/b)2छྨͣͭ • ᶃ api-gateway • ᶄ api-gateway -> main • ᶅ api-gawatey -> content • ᶆ main -> content • 🆕ᶇ api-gateway -> Ϣʔβج൫ͷ request • SLO • Availability: 99.9% • Latency: 95 percentile < 1000msec • -> αʔϏε͝ͱʹݱঢ়ΛՃຯ͠ɺ 100~500msec ᶃ ᶄ ᶅ ᶆ ᶇ ϚΠΫϩαʔϏε͝ͱͷ 4-*4-0Λഇࢭ 4-*Λ͚ΔϝϦοτ͕ෳ 4-*4-0Λཧ͢Δίετʹ ݟ߹͍ͬͯͳ͍ͨΊ Ϣʔβج൫͚4-*4-0Ճ Ϣʔβج൫͚ͷڞ௨4-*͜Ε·ͰFOWPZNFUSJDT Λར༻͍ͯͨ͠ɻFOWPZΛ֎ͨͨ͠Ί%BUBEPH "1.NFUSJDTΛར༻ͨ͠4-*4-0ΛՃ
DevSupport ݟ͠ • Sentry Exception ͰΞϓϦέʔγϣϯίʔυىҼͷͷҎ ֎શͯ Ignore ͢Δ •
SLO Alert ͕དྷͨ࣌ͷجຊతͳରॲํΛυΩϡϝϯτԽ • ରԠͰ͖ͳ͔ͬͨͷΛ2िؒʹ1ճνʔϜͰରԠ
Outline • ࣗݾհ • ʰελσΟαϓϦ தֶߨ࠲ʱʹ͍ͭͯ • SLI/SLO ͳΜͷͨΊʹ͋Δͷ͔ •
αʔϏεӡ༻ͷݱঢ়ͱ՝ • ՝ʹ࣮͋ͨͬͯࡍʹऔΓΜͩ͜ͱ • ·ͱΊ
Կ͕ى͖͍ͯͨͷ͔ • ϦϦʔε࣌ʹҰઃఆ͞Εͨ SLI/SLO 1ݟ͞Εͯͳ ͔ͬͨ • SLO ͕ԿͷՁൃش͍ͯ͠ͳ͔ͬͨ •
SLI/SLO ྆ํΛݟ͠ɺࠓޙܧଓతʹݟ͢͜ͱʹͨ͠
Ͳ͏͖ͩͬͨ͢ͷ͔ • ։ൃऀࢹ • SLO ͕ຊʹՁΛͨΒ͍ͯ͠Δͷ͔Λఆظతʹݕࠪ͢Δ • ൪ྑ͍͕ɺͨ·ʹશһͰݟΔ࣌ؒΛऔΔ͜ͱॏཁ • 1ަͩͱਂ͘ௐΔΠϯηϯςΟϒ͕ಇ͔ͳ͍
• SRE ࢹ • ։ൃνʔϜ͕ SLI/SLO Λఆظతʹݟ͢ΈΛ࡞Δ • ϫʔΫϩʔυ͝ͱʹ SLI/SLO Λࣗಈੜ͢ΔΈΛ࡞Δ
ࠓ͍͑ͨ͜ͱ ҰܾΊͨ SLI/SLO ܧଓతʹݟ͠·͠ΐ͏
Thank you! chaspy chaspy_ Engineering Manager Site Reliability and Web
Application Development at Recruit Co., Ltd. Takeshi Kondo https://chaspy.me