Druid: Interactive Analytics at Scale

DRUID INTERACTIVE ANALYTICS AT SCALE FANGJIN YANG · DRUID COMMITTER

OVERVIEW DEMO SEE SOME NEAT THINGS MOTIVATION WHY DRUID? ARCHITECTURE
PICTURES WITH ARROWS COMMUNITY CONTRIBUTE TO DRUID

DEMO IN CASE THE INTERNET DIDN’T WORK PRETEND YOU SAW
SOMETHING COOL

MOTIVATION

2013 THE PROBLEM ‣ Arbitrary and interactive exploration of event
data • Online advertising • System/application metrics • Network trafﬁc monitoring • Activity stream analysis ‣ Multi-tenancy: lots of concurrent users ‣ Scalability: 20+ TB/day, ad-hoc queries on trillions of events ‣ Recency matters! Real-time analysis

2013 FINDING A SOLUTION ‣ Load all your data into
Hadoop. Query it. Done! ‣ Good job guys, let’s go home

2013 FINDING A SOLUTION Hadoop Event Streams Insight

2013 PROBLEMS WITH THE NAIVE SOLUTION ‣ MapReduce can handle
almost every distributed computing problem ‣ MapReduce over your raw data is ﬂexible but slow ‣ Hadoop is not optimized for query latency ‣ To optimize queries, we need a query layer

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) Query Layer
Hadoop Event Streams Insight

2013 MAKE QUERIES FASTER ‣ What types of queries to
optimize for? • Revenue over time broken down by demographic • Top publishers by clicks over the last month • Number of unique visitors broken down by any dimension • Not dumping the entire dataset • Not examining individual events

WHAT WE TRIED

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) RDBMS Hadoop
Event Streams Insight

2013 ‣ Common solution in data warehousing: • Star Schema
• Aggregate Tables • Query Caching I. RDBMS - THE SETUP

2013 ‣ Queries that were cached • fast ‣ Queries
against aggregate tables • fast to acceptable ‣ Queries against base fact table • generally unacceptable I. RDBMS - THE RESULTS

2013 I. RDBMS - PERFORMANCE Naive benchmark scan rate ~5.5M
rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) NoSQL K/V
Stores Hadoop Event Streams Insight

2013 ‣ Pre-aggregate all dimensional combinations ‣ Store results in
a NoSQL store II. NOSQL - THE SETUP ts gender age revenue 1 M 18 $0.15 1 F 25 $1.03 1 F 18 $0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03

2013 ‣ Queries were fast • range scan on primary
key ‣ Inﬂexible • not aggregated, not available ‣ Not continuously updated • aggregate ﬁrst, then display ‣ Processing scales exponentially II. NOSQL - THE RESULTS

2013 ‣ Dimensional combinations => exponential increase ‣ Tried limiting
dimensional depth • still expands exponentially ‣ Example: ~500k records • 11 dimensions, 5-deep • 4.5 hours on a 15-node Hadoop cluster • 14 dimensions, 5-deep • 9 hours on a 25-node Hadoop cluster II. NOSQL - PERFORMANCE

2013 FINDING A SOLUTION Hadoop (pre-processing and storage) Commercial Databases
Hadoop Event Streams Insight

DRUID AS A QUERY LAYER

2013 KEY FEATURES LOW LATENCY INGESTION FAST AGGREGATIONS ARBITRARY SLICE-N-DICE
CAPABILITIES HIGHLY AVAILABLE APPROXIMATE & EXACT CALCULATIONS DRUID

DATA STORAGE

2013 DATA! timestamp page language city country ... added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

2013 PARTITION DATA timestamp page language city country ... added
deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ‣ Shard data by time ‣ Immutable chunks of data called “segments” Segment 2011-01-01T02/2011-01-01T03 Segment 2011-01-01T01/2011-01-01T02 Segment 2011-01-01T00/2011-01-01T01

2013 IMMUTABLE SEGMENTS ‣ Fundamental storage unit in Druid ‣
No contention between reads and writes ‣ One thread scans one segment ‣ Multiple threads can access same underlying data ‣ Segment sizes -> computation completes in 100s of ms ‣ Simpliﬁes distribution & replication

2013 COLUMNAR STORAGE ‣ Scan/load only what you need ‣
Compression! ‣ Indexes! timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

2013 COLUMN COMPRESSION · DICTIONARIES ‣ Create ids • Justin
Bieber -> 0, Ke$ha -> 1 ‣ Store • page -> [0 0 0 1 1 1] • language -> [0 0 0 0 0 0] timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

2013 BITMAP INDICES ‣ Justin Bieber -> [0, 1, 2]
-> [111000] ‣ Ke$ha -> [3, 4, 5] -> [000111] timestamp page language city country ... added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T01:00:00Z Ke$ha en Calgary CA 17 87 2011-01-01T02:00:00Z Ke$ha en Calgary CA 43 99 2011-01-01T02:00:00Z Ke$ha en Calgary CA 12 53 ...

2013 FAST AND FLEXIBLE QUERIES JUSTIN BIEBER [1, 1, 0,
0] KE$HA [0, 0, 1, 1] JUSTIN BIEBER OR KE$HA [1, 1, 1, 1] row page 0 Justin(Bieber 1 Justin(Bieber 2 Ke$ha 3 Ke$ha

2013 BITMAP INDEX COMPRESSION ‣ Supports CONCISE and Roaring •
Boolean operations directly on compressed indices • Less memory => faster scan rates ‣ More details • http://ricerca.mat.uniroma3.it/users/colanton/concise.html • http://roaringbitmap.org/

ARCHITECTURE

2013 ARCHITECTURE (BATCH ONLY) Historical Node Historical Node Historical Node
Hadoop Data Segments

2013 ‣ Main workhorses of a Druid cluster ‣ Shared-nothing
architecture ‣ Load immutable read-optimized data ‣ Respond to queries HISTORICAL NODES

2013 ARCHITECTURE (BATCH ONLY) Broker Node Historical Node Historical Node
Historical Node Broker Node Queries Hadoop Data Segments

2013 ‣ Knows which nodes hold what data ‣ Query
scatter/gather (send requests to nodes and merge results) ‣ Caching BROKER NODES

2013 EVOLVING A SOLUTION Hadoop (pre-processing and storage) Druid Hadoop
Event Streams Insight

2013 MORE PROBLEMS ‣ We’ve solved the query problem •
Druid gave us arbitrary data exploration & fast queries ‣ But what about data freshness? • Batch loading is slow! • We want “real-time” • Alerts, operational monitoring, etc.

2013 FAST LOADING WITH DRUID ‣ We have an indexing
system ‣ We have a serving system that runs queries on data ‣ We can serve queries while building indexes! ‣ Real-time indexing workers do this

2013 ‣ Log-structured merge-tree ‣ Ingest data and buffer events
in memory in a write-optimized data structure ‣ Periodically persist collected events to disk (converting to a read-optimized format) ‣ Query data as soon as it is ingested REAL-TIME NODES

2013 ARCHITECTURE (STREAMING-ONLY) Broker Node Historical Node Historical Node Historical
Node Broker Node Queries Real-time Nodes Streaming Data Segments

2013 ARCHITECTURE (LAMBDA) Broker Node Historical Node Historical Node Historical
Node Broker Node Queries Hadoop Batch Data Segments Real-time Nodes Streaming Data Segments

2013 REPLICATION ROLLING DEPLOYMENTS + RESTARTS GROW = START PROCESSES
SHRINK = KILL PROCESSES 3 YEARS · NO DOWNTIME FOR SOFTWARE UPDATE AVAILABILITY

2013 ASK ABOUT OUR PLUGINS ‣ Extensible architecture • Build
your own modules and extend Druid • Add your own complex metrics (cardinality estimation, approximate histograms and quantiles, approximate top-K algorithms, etc.) • Add your own proprietary modules

DRUID TODAY

2013 THE COMMUNITY ‣ Growing Community • 50+ contributors from
many different companies • In production at multiple companies, we’re hoping for more! • Ad-tech, network trafﬁc, operations, activity streams, etc. • Support through community forums and IRC • We love contributions!

2014 REALTIME INGESTION >500K EVENTS / SECOND SUSTAINED >1M EVENTS
/ SECOND AT PEAK 10 – 100K EVENTS / SECOND / CORE DRUID IN PRODUCTION

2014 CLUSTER SIZE  >500TB OF SEGMENTS (>20 TRILLION RAW EVENTS) 
>5000 CORES (>400 NODES, >100TB RAM) IT’S CHEAP  MOST COST EFFECTIVE AT THIS SCALE DRUID IN PRODUCTION

2014 0.0 0.5 1.0 1.5 0 1 2 3 4
0 5 10 15 20 90%ile 95%ile 99%ile Feb 03 Feb 10 Feb 17 Feb 24 time query time (seconds) datasource a b c d e f g h Query latency percentiles QUERY LATENCY (500MS AVERAGE) 95% < 1S 99% < 10S DRUID IN PRODUCTION

2014 QUERY VOLUME SEVERAL HUNDRED QUERIES / SECOND VARIETY OF
GROUP BY & TOP-K QUERIES DRUID IN PRODUCTION

DRUID AND THE DATA INFRASTRUCTURE SPACE

2013 STREAMING SETUP Hadoop (pre-processing and storage) Druid Hadoop Event
Streams Insight Kafka Samza Druid

2013 STREAMING ONLY INGESTION ‣ Stream processing isn’t perfect ‣
Difﬁcult to handle corrections of existing data ‣ Windows may be too small for fully accurate operations ‣ Hadoop was actually good at these things

2013 OPEN SOURCE LAMBDA ARCHITECTURE Event Streams Insight Kafka Hadoop
Druid Samza ‣ Real-time ‣ Only on-time data ‣ Some hours later ‣ All data

2013 TAKE-AWAYS ‣ When Druid? • Interactive, fast exploration of
large amounts of data • You need analytics (not key value store) • You want to do your analysis on data as it’s happening (realtime) • You need availability, extensibility and ﬂexibility

DRUID IS OPEN SOURCE WWW.DRUID.IO twitter @druidio irc.freenode.net #druid-dev

THANK YOU

Druid: Interactive Analytics at Scale

Druid: Interactive Analytics at Scale

More Decks by Druid

Other Decks in Technology

Featured

Transcript