Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring OpenStack at Lithium (OpenStack Summ...

Monitoring OpenStack at Lithium (OpenStack Summit Austin 2016)

Ilan Rabinovitch

April 26, 2016
Tweet

More Decks by Ilan Rabinovitch

Other Decks in Technology

Transcript

  1. How is OpenStack is used at Lithium? • Production public-facing

    communities to major brands (SaaS) ◦ Redis, Java, Node,js, MySQL, Elasticsearch, Cassandra • Infrastructure services ◦ Kubernetes + Docker ◦ Chef ◦ DNS ◦ Consul • Development environments • R&D services • Random VMs because it’s easy to use
  2. • Kilo (EU) & Icehouse (US) ◦ Keystone, Nova, Cinder,

    Swift, Glance, Neutron, Heat, Horizon • ~60 hypervisors • ~1000 instances • ~10 TB RAM (~6 TB used) • Contrail for SDN • Ceph & SolidFire (provisioned iOps) for Cinder • Managed by a team of 3 engineers OpenStack @ Lithium
  3. Tools are available out of the box • Horizon ◦

    Difficult to create cross-tenant and/or rollup reports ◦ Multi-region compounds the cross-tenant issues ◦ Not very user friendly for all teams involved (Ops, DevOps, Dev, Finance, Mgmt, etc.) ◦ Doesn’t have time-based metrics to show usage over time • Nova APIs ◦ Rolling your own? Who really has time for that? ◦ Need to run graphite (or similar) to represent the data ◦ Push metrics into statsd or similar service using Python We used a combination of both before using Datadog’s integration Ugh… : (
  4. • Our users want graphs & dashboards • Time series

    graphs tell the real story • Incredibly easy to implement • Easy to add/extend functionality ◦ Open Source code on GitHub ◦ Able to extend with our own custom enhancements • Open source is important to us • Able to see OpenStack metrics side-by-side with our application metrics Why we went with Datadog Photo credits: Google Images - The Indian Government encourages adoption of OSS http://news.softpedia.com/news/Use-of-Open-Source-Software-Is-Now- Mandatory-In-Indian-Government-Offices-477052.shtml
  5. •SaaS based infrastructure and app monitoring •Open Source Agent •Time

    series data (metrics and events) •Processing nearly a trillion data points per day •Intelligent Alerting •We’re hiring! (www.datadoghq.com/careers/) Datadog Overview
  6. How much to measure measure? 1 node • 30 metrics

    from OpenStack 1 operating system (e.g., Linux) • 100 metrics per instance Custom Applications • 50~ metrics
  7. How much to measure measure? 1 VM • 30~ metrics

    from OpenStack 1 operating system (e.g., Linux) • 100 metrics per instance Custom Applications • 50~ metrics Containers 150*N
  8. How much to measure measure? 1 VM • 30~ metrics

    from OpenStack 1 operating system (e.g., Linux) • 100 metrics per instance Custom Applications • 50~ metrics Containers 150*N Metrics Overload!
  9. Perspective Matters By Mysid - Self-made in Inkscape, contours from

    en:Image:Perspective-foreshortening.png., Public Domain, https://commons.wikimedia.org/w/index.php?curid=2562161 By Katri - Flickr: On the road, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=15967705
  10. • You are no longer monitoring an infrastructure stack ◦

    It’s a set of applications that provide your infrastructure • You need to start monitoring not just server stats (cpu, memory, disk) but also how the applications work together • Servers may look fine even if the services are not responding properly • Probably have > 1 network providing the network to the running instances Important concepts to remember https://en.wikipedia.org/wiki/Wikipedia OpenStack service
  11. How do you know what to compare? By Sandy Austin

    - originally posted to Flickr as Snack time again, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=10003672
  12. Examples: Nova - Metrics Work Metrics: • hdd_read_req • running_vms

    • Resource Metrics: • hypervisor_up • vcpus available • free_disk_gb • free_ram_mb
  13. Examples: Nova - Events • Configuration Change • Code Deployment

    • Service Started / Stopped • Instance Migrations • Instance Creation • Adding / Removing Nodes