Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The new InfluxDB storage engine and some query ...

Paul Dix
October 15, 2015

The new InfluxDB storage engine and some query language ideas

Short talk I gave at GranfaCon

Paul Dix

October 15, 2015
Tweet

More Decks by Paul Dix

Other Decks in Technology

Transcript

  1. The new InfluxDB storage engine and some query language ideas

    Paul Dix CEO at InfluxDB @pauldix paul@influxdb.com
  2. Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each

    is an underlying DB efficient to drop old data 10/13/2015 10/10/2015
  3. Components WAL In memory cache Index Files Similar to LSM

    Trees Same like MemTables like SSTables
  4. In Memory Cache // cache and flush variables cacheLock sync.RWMutex

    cache map[string]Values flushCache map[string]Values temperature,device=dev1,building=b1#internal
  5. In Memory Cache // cache and flush variables cacheLock sync.RWMutex

    cache map[string]Values flushCache map[string]Values writes can come in while WAL flushes
  6. // cache and flush variables cacheLock sync.RWMutex cache map[string]Values flushCache

    map[string]Values dirtySort map[string]bool values can come in out of order. mark if so, sort at query time
  7. awesome time series data WAL (an append only file) in

    memory index on disk index (periodic flushes)
  8. The Index Data File Min Time: 10000 Max Time: 29999

    Data File Min Time: 30000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 Contiguous blocks of time
  9. The Index Data File Min Time: 10000 Max Time: 29999

    Data File Min Time: 15000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 can overlap
  10. The Index cpu,host=A Min Time: 10000 Max Time: 20000 cpu,host=A

    Min Time: 21000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 but a specific series must not overlap
  11. The Index Data File Data File Data File a file

    will never overlap with more than 2 others time ascending Data File Data File
  12. The Index Data File Min Time: 10000 Max Time: 29999

    Data File Min Time: 30000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 Data File Min Time: 10000 Max Time: 99999 they periodically get compacted (like LSM)
  13. Compacting while appending new data func (w *WriteLock) LockRange(min, max

    int64) { // sweet code here } func (w *WriteLock) UnlockRange(min, max int64) { // sweet code here }
  14. Compacting while appending new data func (w *WriteLock) LockRange(min, max

    int64) { // sweet code here } func (w *WriteLock) UnlockRange(min, max int64) { // sweet code here } This should block until we get it
  15. Back to the data files… Data File Min Time: 10000

    Max Time: 29999 Data File Min Time: 30000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999
  16. Timestamps (best case): Run length encoding Deltas are all the

    same for a block (only requires start time, delta, and count)
  17. one test: 100,000 series 100,000 points per series 10,000,000,000 total

    points 5,000 points per request c3.8xlarge, writes from 4 other systems ~390,000 points/sec ~3 bytes/point (random floats, could be better)
  18. Then there are fills select mean(value) from cpu where host

    = 'A' and time > now() - 4h group by time(5m) fill(0)
  19. Moving the FROM? SELECT from('cpu').mean(value) from('memory').mean(value) WHERE time > now()

    - 4h GROUP BY time(1m) consistent time and filtering applied to both