TCP streams, decodes HTTP, MySQL, PgSQL, REDIS, Thrift-RPC • Looks for requests, waits for the matching response • Records response time, URLs, response codes, etc
per second for each server, response time percentiles, etc.) • Write code to extract these metrics, store them in a DB • Store the transactions in a DB • Drilling down is difficult • Features like “Top 10 method with errors” are difficult to implement
logs • Clear and simple flow for the data • You don’t have to pre-create the metrics • Ad-hoc troubleshooting and analytics by using Kibana • Drilling down to the problematic transactions is trivial • Top N features are trivial • Slicing by different dimensions is easy
that other approaches miss • No changes to the code or to the monitored application • Minimal knowledge about the monitored app is required • No latency overhead • When using tap points, zero CPU/memory overhead on the app servers
of data • Compared to log processing, larger CPU requirements • Privacy concerns • Doesn’t work for encrypted protocols • Doesn’t work for “in-house” protocols
vs hardware requirements • Sample by: • protocol (e.g. store all MySQL requests, sample REDIS 1/10) • method (e.g. store all PUTs requests, sample GETs 1/10) • status code (e.g. store all errors, sample successes) • response time (e.g. store all slow transactions)