usage ◦ Disk I/O • Service metrics ◦ Number of SQL queries ◦ Cache hits and misses • Application metrics ◦ Number of registrations ◦ Page generation duration
Sum ◦ Average ◦ Max ◦ Min ◦ 90th percentile • Can be used to reduce storage size ◦ Every minute from now to 1 week ago ◦ Every 15 minutes from 1 week ago to 1 month ago ◦ Every hour from 1 month ago to 6 months ago ◦ Every day from 6 month ago to …
12:34:32 boston-01: User “alice” connected to the application from IP 12.34.56.78 Get a data table: Date 2016-01-14 Time 12:34:32 Server boston-01 Event type login Username alice IP 12.34.56.78
Time series data • Consolidation over time • Mathematics • Storage size Numbers Events are good at: • Storing any message in any format • Extracting fields from messages for indexation and queries Text
• Plugins for system and services metrics : CPU usage, RAM, load average, network, MySQL, Apache, AMQP, Carbon, CPU Temperature, Filesystem, Disk, IRQ, NFS, PostgreSQL, Syslog, MongoDB, Redis, File count, … • You can add custom metrics by using the Exec plugin Sends all metrics to your storage
• Agents to collect metrics • Web UI to get realtime alerting • Alert by Mail/SMS/Anything • Complete metrics extraction ◦ System metrics ◦ Service metrics ◦ Remote calls
your metrics flow • Lot of available backends • Manage different metric types ◦ Counters (+1, +3, +2) ◦ Sampling ( ◦ Gauges (200, +3, -2) • A very simple UDP protocol • Flush metrics every X seconds • Optimize performance
everything • 3 levels: ◦ System: the Debian/Archlinux/whatever system you are using ◦ Services: Apache, MySQL, Docker, Nginx, Redis, … ◦ Applications: your Symfony application How to Measure Anything: Finding the Value of Intangibles in Business By Douglas W. Hubbard
application code to your monitoring • M6Web/StatsdBundle provides a smart way to achieve this: m6_statsd: clients: default: Events: forum.read: increment : mysite.forum.read
it’s better to be alerted • Define rules and get notified when a rule is violated • Don’t put thresolds at 95% : if your filesystem is filled at 95%, your system is probably already suffering ◦ Prefer 60% • Handling the problem before it happens avoids recovering over a crash • The alerting rules can be complex ◦ On work hours, send a mail to the team ◦ Otherwise, send an SMS to the IT manager phone ◦ If the IT manager is on holidays, send to his backup
whole company can use it • On-the-fly field extraction ◦ Beautiful interface to configure them • Powerful expression language: ◦ index=apache sourcetype=frontend | timechart count BY host ◦ index=apache sourcetype=frontend host=auchan.fr | stats avg(response_time) BY path • Powerful graph constructor • Data models → Pivot tables for business