Berlin 2013 - Session - Jeff Weinstein

How monitoring can improve the rest of the company
Monitorama EU 2013 @jeﬀ_weinstein

I real-time and batch data analytics

Monitoring can wildly improve the whole company by
sharing data and sharing techniques.

Monitoring Folks Developers Business Analysts
ExecuIves & Product Data ScienIsts Data

Apps & Services & Systems Users
Data Code & Conﬁg Monitoring

Some problems…

Data Processing Apps Systems Logs /
Events Metrics Graphs & Alerts Apps 3rd Party Reports & Queries ETL AnalyIc Systems Monitoring: Streaming BI: Batch

Data Needs Logs Metrics Logs Metrics
Streaming Batch Data Monitoring BI

Data Tools Stack Monitoring •  Ad hoc
–  sed, grep, awk –  ES, LogStash, Splunk, … •  Storage –  Hosts, Ganglia, OTSDB –  Central syslog server •  VisualizaIon/ReporIng –  Graphite, RRDTool, 3rd party –  Homegrown •  AlerIng/EscalaIon –  Nagios, Sensu, PagerDuty, … Rest of company •  Ad hoc –  Excel, SQL, Hive –  MapReduce, … •  Storage –  Lots o’ databases, Excel –  Hadoop, RDBMS… •  VisualizaIon/ReporIng –  Excel, R, Tableau ... –  Dinosaur apps, … •  AlerIng/EscalaIon –  nada

Metrics

Views Unintelligible generated views Too granular for long
term trends Lack of historical Intolerant to anomalies

Team and incenIves •  What team? •  Change
vs. reliability •  Planning •  Budget •  Churn

Good or bad? •  Speciﬁc Tools •  Decentralized
•  Focus •  Ownership •  Lost context •  Siloed work •  Data dark •  Misunderstanding

Some ﬁxes

End to End Data Pipeline ü Structured logs ü (Conﬁg)
ü Measure once ü AutomaIc metrics ü API ü Graph tools ü Glossary ü AnnotaIons and tags ü Pipeline

Structured events •  JSON (or whatever) •  (opIonal)
conﬁg •  Tags per key – Type – Tag: latency, funnel,… – DescripIon – Storage

Auto: Graphs, Glossary, & Storage •  Graphs and dashboards
•  * templates •  Views and stats •  Glossary •  Batch analyIcs •  Long term storage

build learn communicate inspire

Developers •  Logging toolkit •  Data pipeline
•  Pain points •  Outage causes •  Deployment pracIces •  EscalaIon playbook •  Measurement as TDD •  Monitor staging env

Business Analysts •  Structured logs •  Conﬁg
for ETL •  Metrics deﬁniIons •  Slices and visualizaIons •  Data size and cardinality •  Outages and delays •  Flexibility •  VisualizaIon and tools

Data ScienIsts •  Access to (meta)data •  Query
monitoring •  StaIsIcs and models •  New data streams •  Context of data issues •  What’s in the logs •  Validate algorithms •  Teach stats and models!

Product & ExecuIves •  Curated dashboards •  Graph/alert
tools •  Learn the business •  PrioriIze alerts by $ •  Incident post mortems •  Metrics granularity •  Data driven decisions •  Recognize and celebrate

Monitoring can become the data plahorm and improve all
teams with its techniques.

Icons from The Noun Project: Dmitry Baranovskiy, Benjamin Orlovski, Luis
Prado, MikaDo Nguyen, Yarden Gilboa, Javier Cabezas, Icons Pusher, Jeremy Bristol, Blake Thomas, RiIka Khasgiwale, Mayene de Leon, Yorlmar Campos, Sergey Shmid @jeﬀ_weinstein Thanks! hiring ;)

Berlin 2013 - Session - Jeff Weinstein

Berlin 2013 - Session - Jeff Weinstein

Monitorama

More Decks by Monitorama

Featured

Transcript

How monitoring can improve the rest of the company

I real-time and batch data analytics

Monitoring can wildly improve the whole company by

Monitoring Folks Developers Business Analysts

Apps & Services & Systems Users

Some problems…

Data Processing Apps Systems Logs /

Data Needs Logs Metrics Logs Metrics

Data Tools Stack Monitoring •  Ad hoc

Metrics

Views Unintelligible generated views Too granular for long

Team and incenIves •  What team? •  Change

Good or bad? •  Speciﬁc Tools •  Decentralized

Some ﬁxes

End to End Data Pipeline ü Structured logs ü (Conﬁg)

Structured events •  JSON (or whatever) •  (opIonal)

Auto: Graphs, Glossary, & Storage •  Graphs and dashboards

build learn communicate inspire

Developers •  Logging toolkit •  Data pipeline

Business Analysts •  Structured logs •  Conﬁg

Data ScienIsts •  Access to (meta)data •  Query

Product & ExecuIves •  Curated dashboards •  Graph/alert

Monitoring can become the data plahorm and improve all

Icons from The Noun Project: Dmitry Baranovskiy, Benjamin Orlovski, Luis