Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Observability with Prometheus (Darkmi...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Talk presented online on December 13th at Darkmira Tour PHP 2020 https://php.darkmiratour.rocks/2020/schedule.html Demo available at https://github.com/wsilva/darkmira-prometheus-php-demo .

Avatar for Wellington F. Silva

Wellington F. Silva

December 13, 2020
Tweet

More Decks by Wellington F. Silva

Other Decks in Technology

Transcript

  1. Wellington F. Silva contact: @_wsilva nicks: wsilva, boina, tom, fisi*

    Roles: pai, marido, tec. telecom, programador, sysadmin, docker community leader, instrutor, escritor, zend certified engineer e docker certified associate, certified kubernetes administrator * in deprecation
  2. Observability 3 pilars: • Metrics • Logging • Tracing •

    Events (kind of new) - MELT, or 4 golden signals
  3. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation
  4. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation • Improve resource utilisation
  5. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation • Improve resource utilisation • Lower costs
  6. Observability • Demand effort on coding and configuring • Could

    extends time to delivery • Constant neglected
  7. Monitoring • Subset of observability • Show points where we

    start to dig • Makes it easier and faster to find bottlenecks
  8. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue
  9. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second
  10. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second • data engineers, time to run an ETL job, how many data are been processed, the freshness of the data
  11. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team
  12. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team • Use realistic objectives
  13. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team • Use realistic objectives • Reevaluate the values periodically
  14. Monitoring SLA - Service Level Agreement • Should be higher

    than the SLO. When SLO breaches it must alerts before SLA breaches
  15. Monitoring SLA - Service Level Agreement • Should be higher

    than the SLO. When SLO breaches it must alerts before SLA breaches • Pay attention on the agreement and honor it
  16. Prometheus From the Greek Promēthéus, "forethought". He is a titan

    (second generation), son of Iapetus (son of Uranus; an incest between Uranus and Gaia) and brother of Atlas, Epimetheus and Menoetius. He was a defender of humanity, responsible for stealing Hestia's fire and give it to mortals.
  17. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation)
  18. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts
  19. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts • Stores metric in time series database (TSDB)
  20. Prometheus • Pull based model (scale the exporter) • Good

    for telemetry metrics and statistical metrics
  21. Prometheus • Pull based model (scale the exporter) • Good

    for telemetry metrics and statistical metrics • Known alternatives: graphite / collectd / carbon, zabbix (all push based)
  22. Prometheus Disadvantages: • Not too easy to horizontal scale •

    No query cache • PromQL instead of regular SQL
  23. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc)
  24. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management
  25. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management • Dashboard for query debugging
  26. Prometheus Advantages: • Multidimensional data model • Easy to set

    up with Grafana • PromQL ( kind of functional style, power for calculation)
  27. Tips To set up a counter $registry = \Prometheus\CollectorRegistry::getDefault(); $counter

    = $registry- >getOrRegisterCounter('demo', 'visitor_counter', 'it increases', ['type']); $counter->incBy(3, ['blue']);
  28. Tips To set up a gauge $registry = \Prometheus\CollectorRegistry::getDefault(); $gauge

    = $registry->getOrRegisterGauge('demo', 'score', 'it sets', ['type']); $gauge->set(2.5, ['blue']);
  29. Tips To set up an histogram $registry = \Prometheus\CollectorRegistry::getDefault(); $histogram

    = $registry- >getOrRegisterHistogram('demo', ‘secs_bucket', 'it observes', ['type'], [0.1, 1, 2, 3.5, 4, 5, 6, 7, 8, 9]); $histogram->observe(3.5, ['blue']);
  30. Tips To show the metrics to be scraped $registry =

    \Prometheus\CollectorRegistry::getDefault(); $renderer = new RenderTextFormat(); $result = $renderer->render( $registry->getMetricFamilySamples() ); header('Content-type: ' . RenderTextFormat::MIME_TYPE); echo $result;
  31. Tips Starts with RED method Set up the following query

    (ud:itentity:rate_10m < bool 1000) * 100 + (ud:error:percent_10m > bool 1.5) * 10 + (ud:read:duration_p99_10m < bool 25) * 1
  32. Tips Define a dashboard in Grafana that maps the following

    results: 111 = x Rate, x Errors, x Duration 110 = x Rate, x Errors 101 = x Rate, x Duration 100 = x Rate 011 = x Errors, x Duration 010 = x Errors 001 = x Duration 000 = Ok