Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[2016.06 Meetup #3][TALK #2] Pedro Pessoa - Dat...

[2016.06 Meetup #3][TALK #2] Pedro Pessoa - Data That Matters

It is people that run ops. Lots of data out there, what does really matter so that we don't feel overwhelmed?

Nowadays you can collect uncountable metrics from devices. The challenge is to store and make good use of them. We'll look at practical use cases for alert monitoring.

DevOps Lisbon

June 20, 2016
Tweet

More Decks by DevOps Lisbon

Other Decks in Technology

Transcript

  1. Agenda • Metrics by order of importance: ◦ Cats ◦

    Meta monitoring ◦ Redis • HumanOps
  2. Example #0 - Cats! • Cats are independent creatures. •

    They eat and drink as they please. • But how often?
  3. Example #0 - Cats! • Cats are independent creatures. •

    They eat and drink as they please. • But how often? • And while we’re at it, when do they use the litterbox? Wouldn’t it be cool to get a notification when it’s time to clean the litter?
  4. Example #0 - Cats! SAFETY NOTICE: If you run this

    project (or any exercise involving pets) please do not use strings or laces to attach anything on your pet. To avoid potential injury, make sure you use safe, purpose-built cat collars.
  5. Example #0 - Cats! • Bluetooth Low Energy with RSSI

    (Received Signal Strength Indication)
  6. Example #0 - Cats! • Raspberry Pi. Wi-Fi and Bluetooth

    4.0 dongles (you need Bluetooth version 4.0 for Low Energy support).
  7. Example #0 - Cats! 0 - if we couldn’t find

    the device. The cat could be anywhere. 1- if the device is reachable but “not close”. The cat is probably snoozing somewhere. 2- if the device is “close”. The cat is probably drinking, eating, or using the litterbox. The reading doesn’t specify which facility the cat is using (if at all). A future iteration could employ weight or humidity sensors to do just that.
  8. Example #0 - Cats! • Most activity appears to happen

    around meal times (which is to be expected).
  9. Example #0 - Cats! • Most activity appears to happen

    around meal times (which is to be expected). • We noticed Charol made a few more visits to the food tower (no wonder he weighs 7kg).
  10. Example #0 - Cats! • The cats didn’t drink much

    water. • Charol visited the fountain a bit more than Oli. That’s probably because Charol eats more dry food than Oli.
  11. Example #0 - Cats! • The cats didn’t drink much

    water. • Charol visited the fountain a bit more than Oli. That’s probably because Charol eats more dry food than Oli. • The water fountain is further out in a corner, that explains the occasional 0 readings, when the device was unreachable.
  12. Example #0 - Cats! • The cats don’t use the

    toilet that often. • Given that the litterbox is in the toilet upstairs, we have 0 readings as well.
  13. Example #1 - Nagios • Meta monitoring! aka “Who will

    alert you when your monitoring service is down?”
  14. Example #1 - Nagios • Meta monitoring! aka “Who will

    alert you when your monitoring service is down?” • Monitor the monitoring!
  15. Example #1 - Nagios • Non-critical alerting vs. Critical alerting

    ◦ what should you look into in the morning ◦ what should get you out of bed
  16. Example #3 - Redis • Redis is a key-value database,

    and one of the most popular NoSQL databases out there. Redis (REmote DIctionary Server) works in a similar fashion to memcached, albeit with a non-volatile dataset.
  17. Example #3 - Redis • The dataset is stored entirely

    in memory (one of the reasons Redis is so fast) and it is periodically flushed to disk so it remains persistent.
  18. Example #3 - Redis • Redis also provides native support

    for manipulating and querying data structures such as lists, sets and hashes.
  19. Example #3 - Redis • Usually used to improve the

    response time of retrieving and/or matching data
  20. Example #3 - Redis • Monitoring is not alerting: Our

    rule of thumb is: “collect all metrics that help with troubleshooting, alert only on those that require an action.”
  21. Example #3 - Redis • Monitoring is not alerting: Our

    rule of thumb is: “collect all metrics that help with troubleshooting, alert only on those that require an action.” • Use previous work (like this one!)
  22. Takeaways • Learn from others that have walked the path

    • Issues pile up • Watch for trends • Avoid setting up your own monitoring system as much as you can - ‘DEVOPS’ coupon: https://www.serverdensity. com/conferences • We are only human