Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get Instrumented: How Prometheus Can Unify Your...

Get Instrumented: How Prometheus Can Unify Your Metrics

Hynek Schlawack

May 31, 2016
Tweet

More Decks by Hynek Schlawack

Other Decks in Programming

Transcript

  1. Metrics 12:00 12:01 12:02 12:03 12:04 avg latency 0.3 0.5

    0.8 1.1 2.6 server load 0.3 1.0 2.3 3.5 5.2
  2. ❖ avg(request time) ≠ avg(UX) ❖ avg({1, 1, 1, 1,

    10}) = 2.8 ❖ median({1, 1, 1, 1, 10}) = 1 Averages
  3. ❖ avg(request time) ≠ avg(UX) ❖ avg({1, 1, 1, 1,

    10}) = 2.8 ❖ median({1, 1, 1, 1, 10}) = 1 Averages
  4. ❖ avg(request time) ≠ avg(UX) ❖ avg({1, 1, 1, 1,

    10}) = 2.8 ❖ median({1, 1, 1, 1, 10}) = 1 ❖ median({1, 1, 100_000}) = 1 Averages
  5. Pull: Advantages ❖ multiple Prometheis easy ❖ outage detection ❖

    predictable, no self-DoS ❖ easy to instrument 3rd parties
  6. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  7. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  8. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  9. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  10. Metrics Format # HELP req_seconds Time spent \ processing a

    request in seconds. # TYPE req_seconds histogram req_seconds_count 390.0 req_seconds_sum 177.0319407
  11. Apache nginx Django PostgreSQL MySQL MongoDB CouchDB redis Varnish etcd

    Kubernetes Consul collectd HAProxy statsd graphite InfluxDB SNMP
  12. Apache nginx Django PostgreSQL MySQL MongoDB CouchDB redis Varnish etcd

    Kubernetes Consul collectd HAProxy statsd graphite InfluxDB SNMP
  13. from flask import Flask, g, request from cat_or_not import is_cat

    app = Flask(__name__) @app.route("/analyze", methods=["POST"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if __name__ == "__main__": app.run()
  14. from flask import Flask, g, request from cat_or_not import is_cat

    app = Flask(__name__) @app.route("/analyze", methods=["POST"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if __name__ == "__main__": app.run()
  15. from flask import Flask, g, request from cat_or_not import is_cat

    app = Flask(__name__) @app.route("/analyze", methods=["POST"]) def analyze(): g.auth.check(request) return ("meow!" if is_cat(request.files["pic"]) else "nope!") if __name__ == "__main__": app.run()
  16. from prometheus_client import \ start_http_server # … if __name__ ==

    "__main__": start_http_server(8000) app.run()
  17. from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds",

    "Time spent in HTTP requests.") ANALYZE_TIME = Histogram( "cat_or_not_analyze_seconds", "Time spent analyzing pictures.")
  18. from prometheus_client import \ Histogram, Gauge REQUEST_TIME = Histogram( "cat_or_not_request_seconds",

    "Time spent in HTTP requests.") ANALYZE_TIME = Histogram( "cat_or_not_analyze_seconds", "Time spent analyzing pictures.") IN_PROGRESS = Gauge( "cat_or_not_in_progress_requests", "Number of requests in progress.")
  19. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()
  20. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()
  21. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()
  22. AUTH_TIME = Histogram("auth_seconds", "Time spent authenticating.") AUTH_ERRS = Counter("auth_errors_total", "Errors

    while authing.") AUTH_WRONG_CREDS = Counter("auth_wrong_creds_total", "Wrong credentials.") class Auth: # ... @AUTH_TIME.time() def auth(self, request): while True: try: return self._auth(request) except WrongCredsError: AUTH_WRONG_CREDS.inc() raise except Exception: AUTH_ERRS.inc()