Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Cloud Run Reliability/Observability at ソウゾウ
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Ryuzo Yamamoto
April 19, 2023
Technology
790
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Cloud Run Reliability/Observability at ソウゾウ
Ryuzo Yamamoto
April 19, 2023
More Decks by Ryuzo Yamamoto
See All by Ryuzo Yamamoto
Cloud Run CI/CD + QA at ソウゾウ - Cloud Run Casual Talk !
dragon3
2
500
福岡からニューヨークに転勤になったエンジニアの話
dragon3
1
77
Other Decks in Technology
See All in Technology
技術・能力を向上する原理原則 #きのこセッションa #きのこ2026
bash0c7
0
130
AIをフル活用してオンコール機能のプロトタイプを2日で作った話 / Building an AI-Powered On-Call Prototype in Just Two Days
nari_ex
0
140
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
270
スタートアップにAmazon EKSは早すぎる? マルチプロダクト戦略を加速する Platform Engineeringの実践 / Is Amazon EKS Too Soon for Startups? Practical Platform Engineering to Accelerate a Multi-Product Strategy
elmodev09
1
1.8k
コミュニティの有益性 ~JAWS Days 2026 での体験を通して~ / The Benefits of a Community ~Through My Experience at JAWS Days 2026~
seike460
PRO
0
270
OTel × Datadog で 「AI活用」を計測し、改善に繋げる
shihochan
2
640
時期が悪い!それでもRaspberry Piを買って遊んで活用するには / 20260627-osc26do-rpi-jikigawarui
akkiesoft
0
830
データレイクの「見えない問題」を可視化する
sansantech
PRO
1
200
MySQL & MySQL HeatWave Report - June 2026
freshdaz
0
150
BPaaSで進むAIオペレーションの現在地 AI実装が効く領域とスケーラビリティの選定と実装
kentarofujii
0
200
「ビジネスがわかるエンジニア」とは何か?
ryooob
0
320
【FinOps】データドリブンな意思決定を目指して
z63d
0
370
Featured
See All Featured
Documentation Writing (for coders)
carmenintech
77
5.4k
The Language of Interfaces
destraynor
162
27k
First, design no harm
axbom
PRO
2
1.2k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
190
The Pragmatic Product Professional
lauravandoore
37
7.3k
Embracing the Ebb and Flow
colly
88
5.1k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.6k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
390
Ethics towards AI in product and experience design
skipperchong
2
310
Practical Orchestrator
shlominoach
191
11k
So, you think you're a good person
axbom
PRO
2
2.1k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Transcript
1 Cloud Run Reliability/Observability at ソウゾウ Ryuzo Yamamoto Cloud Run
Casual Talk!#2
2 山本 竜三 自己紹介 @dragon3 Software Engineer Lead Architect /
SRE at Souzoh in Fukuoka
3 ソウゾウ / メルカリShops
4 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Agenda
5 Architecture Next.js Cloud Run GraphQL Cloud Run imgproxy Cloud
Run microservice Cloud Run microservice Cloud Run Cloud Storage Cloud Load Balancing Cloud SQL Memorystore Cloud Run (70~ services) microservice(s) Cloud Run
6 Tech Stack • Monorepo ◦ Go, TypeScript, Python, Java
◦ 70~ microservices • Bazel, Turborepo • GraphQL / gRPC • Serverless (Cloud Run) • PostgreSQL, Redis • Cloud PubSub, Tasks, Workflows, Scheduler, VertexAI
7 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Agenda
8 • Logs ◦ JSON structured logging ◦ Cloud Logging
-> BigQuery • Metrics ◦ Log-based Metrics ◦ Cloud Monitoring -> Datadog • Traces ◦ OpenTelemetry -> Datadog Observability
9 Observability - Logs microservice Cloud Run container log STDOUT
/ STDERR Logging BigQuery { "message": "failed to say hello", "something_id": "xxxxxxxx" "serviceContext": { "version": "1.0.1", "service": "echo" }, "metadata": { "user-agent": "graphql/1.0.0 grpc-node-js/1.7.3", } } Sink
10 Observability - Metrics microservice Cloud Run container log STDOUT
/ STDERR Logging { "message": "grpc: finished server unary /echo.EchoService/Hello", "grpc": { "type": "unary", "kind": "server", "latency": 0.002360152, "code": "OK", "method": "Hello", "service": "echo.EchoService" }, "serviceContext": { "version": "1.0.1", "service": "echo" }, "metadata": { "user-agent": "graphql/1.0.0 grpc-node-js/1.7.3", } } Log-based Metrics Monitoring Log-based Metrics + Other GCP Metrics gRPC interceptor
11 Observability - Traces Next.js Cloud Run GraphQL Cloud Run
microservice Cloud Run microservice Cloud Run microservice(s) Cloud Run datadog-agent Cloud Run OTLP (gRPC)
12 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Agenda
13 Reliability - SLOs & Monitors as Code • SLO
をすべての gRPC method 毎に設定 ◦ Availability (e.g. 99.9% / 30 days) ◦ Latency (e.g. p95 100ms) • 設定の自動化 ◦ protobuf plugin + Terraform module • Multiwindow, Multi-Burn-Rate Alerts ◦ https://docs.datadoghq.com/monitors/service_level_objectives/burn_rate/ ◦ https://sre.google/workbook/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts
14 Reliability - SLOs & Monitors as Code ... rpc
Hello(HelloRequest) returns (HelloResponse) { option (extension.v2.method_monitoring) = { availability: { goal: 99.5 } latency: { threshold_ms: 100 percentile: 95 } }; } ... Terraform configuration protoc apply by CI (GitHub Actions) SLO monitors
15 • Architecture, Tech Stack • Observability ◦ Logs, Metrics,
Traces • Reliability ◦ SLOs & Monitors as Code Wrap Up