Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
520
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.7k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
520
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
CSC509 Lecture 09
javiergs
PRO
0
140
PHP でアセンブリ言語のように書く技術
memory1994
PRO
1
170
cmp.Or に感動した
otakakot
3
210
Duckdb-Wasmでローカルダッシュボードを作ってみた
nkforwork
0
130
as(型アサーション)を書く前にできること
marokanatani
10
2.7k
OnlineTestConf: Test Automation Friend or Foe
maaretp
0
120
3rd party scriptでもReactを使いたい! Preact + Reactのハイブリッド開発
righttouch
PRO
1
610
C++でシェーダを書く
fadis
6
4.1k
受け取る人から提供する人になるということ
little_rubyist
0
250
ヤプリ新卒SREの オンボーディング
masaki12
0
130
AI時代におけるSRE、 あるいはエンジニアの生存戦略
pyama86
6
1.2k
Streams APIとTCPフロー制御 / Web Streams API and TCP flow control
tasshi
2
350
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
169
50k
Site-Speed That Sticks
csswizardry
0
31
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
42
9.2k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
44
2.2k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
Code Review Best Practice
trishagee
64
17k
Fireside Chat
paigeccino
34
3k
Intergalactic Javascript Robots from Outer Space
tanoku
269
27k
Making Projects Easy
brettharned
115
5.9k
Docker and Python
trallard
40
3.1k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll