rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time
a NoSQL store II. NOSQL - THE SETUP ts gender age revenue 1 M 18 $0.15 1 F 25 $1.03 1 F 18 $0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03
key ‣ Inflexible • not aggregated, not available ‣ Not continuously updated • aggregate first, then display ‣ Processing scales exponentially II. NOSQL - THE RESULTS
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
-> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] timestamp page language city country ... added deleted! 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65! 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62! 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45! 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99! 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53! ...
your own modules and extend Druid • Add your own complex metrics (cardinality estimation, approximate histograms and quantiles, approximate top-K algorithms, etc.) • Add your own data ingestion modules (Hadoop, Storm, etc.) • Add your own proprietary modules ‣ Build your own node types from existing modules ‣ Available in Druid 0.6
Growing Community • ~30 contributors (not all publicly listed) from many different companies • In production at multiple companies, we’re hoping for more! • Support through community forums and IRC • We love contributions!
large amounts of data • You need analytics (not key value store) • You want to do your analysis on data as it’s happening (realtime) • Your store needs to be always-on • You need extensibility and flexibility
you have can easily be handled by MYSQL • You’re querying for individual entries or doing lookups (not analytics) • You need to constantly do updates and deletes • Batch is good enough • Downtime is no big deal