in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES
Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
-> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] ‣ Compressed with CONCISE timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 90% < 1S 95% < 5S 99% < 10S ! ! ! DRUID IN PRODUCTION
old data, short queries over long queries ‣ Make it Stop • Ability to time-out / cancel queries ‣ Small units of computation • Keep shards small ‣ Cost / performance trade-offs • Put redundant copies on cheaper hardware • Tweak ratio of data in memory / on-disk