Upgrade to Pro — share decks privately, control downloads, hide ads and more …

InfluxDB intro for Data Driven NYC

Paul Dix
March 17, 2015
350

InfluxDB intro for Data Driven NYC

Talk I gave on 3/17/2015 at Data Driven NYC. Introduces the motivation behind creating a time series database and some of the basic features in 0.9.0.

Paul Dix

March 17, 2015
Tweet

Transcript

  1. Example from metrics: 100 measurements per host * 10 hosts

    * 8640 per day (once every 10s) * 365 days = 3,153,600,000 records per year
  2. –Paul Dix “Building an application with an analytics component today

    is like building a web application in 1998. You spend months building infrastructure before getting to the actual thing you want to build.”
  3. Analytics and monitoring should be about analyzing and interpreting data,

    not the infrastructure to store and process it.
  4. Data model • Databases • Measurements • cpu_load, temperature, log_lines,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc.
  5. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset
  6. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset • Points • Fields - bool, int64, float64, string, []byte • Timestamp - nano epoch
  7. Writing Data { "database": "mydb", "retentionPolicy": "30d", "points": [ {

    "name": "cpu_load", "tags": { "host": "server01", "region": "us-west" }, "timestamp": "2009-11-10T23:00:00Z", "fields": { "value": 0.64 } } ] } Measurement Tags Fields
  8. SELECT value FROM cpu WHERE host = 'serverA' { "results":[

    { "query": "SELECT value FROM cpu WHERE host='serverA'", "series": [ { "name": "cpu", "tags": { "host": "serverA" }, "columns": ["time", "value"], "values": [ ["2009-11-10T23:00:00Z", 22.1], ["2009-11-10T23:00:10Z", 25.2] ] } ] } ] } QUERY: RESULTS:
  9. SELECT value FROM cpu WHERE host = ‘serverA'OR host =

    'serverB' QUERY: { "series": [ { "name": "cpu", "tags": { "host": "serverA" }, "columns": ["time", "value"], "values": [] }, { "name": "cpu", "tags": { "host": "serverB" }, "columns": ["time", "value"], "values": [] } ] } SERIES IN RESULT:
  10. SELECT percentile(90, value) FROM cpu WHERE time > now() -

    4h GROUP BY time(10m), region QUERY: [ { "name": "cpu", "tags": { "region": "us-west" }, "columns": ["time", "percentile"], "values": [] }, { "name": "cpu", "tags": { "region": "us-east" }, "columns": ["time", "percentile"], "values": [] } ] SERIES IN RESULT:
  11. Multiple aggregates SELECT mean(value), percentile(90, value), min(value), max(value) FROM cpu

    WHERE host='serverA' AND time > now() - 48h GROUP BY time(1h)
  12. Return every series in CPU SELECT mean(value) FROM cpu WHERE

    time > now() - 48h GROUP BY time(1h), *
  13. { "results":[ { "query": "SHOW MEASUREMENTS", "series": [ { "name":

    "measurements", "columns": ["name"], "values": [ ["cpu"], ["memory"], ["network"] ] } ] } ] }
  14. { "results":[ { "query": "SHOW SERIES", "series": [ { "name":

    "cpu", "columns": ["id", "region", "host"], "values": [ [1, "us-west", "serverA"], [2, "us-east", "serverB"] ] } ] } ] }
  15. { "query": "SHOW MEASUREMENTS WHERE service='redis'", "series": [ { "name":

    "measurements", "name": "series", "columns": ["measurement"], "values": [ ["key_count"], ["connections"] ] } ] }
  16. { "query": "SHOW TAG KEYS from cpu", "series": [ {

    "name": "keys", "columns": ["key"], "values": [ ["region"], ["host"] ] } ] }
  17. { "query": "SHOW TAG VALUES WITH KEY = service", "series":

    [ { "name": "series", "columns": ["service"], "values": [ ["redis"], ["apache"] ] } ] }
  18. { "query": "SHOW TAG VALUES FROM cpu WITH KEY =

    service", "series": [ { "name": "series", "columns": ["service"], "values": [ ["redis"], ["apache"] ] } ] }