Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, Proc. of the 36th Int'l Conf on Very Large Data Bases (2010), pp. 330-339 Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. … “ “ Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. …
friendly) Logical Plan— what we want to do (language agnostic, computer friendly) Physical Plan—how we want to do it (the best way we can tell) Execution Plan—where we want to do it
query planning, execution, etc, are distributed Any node can act as endpoint for a query—foreman Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node
scheduling, locality information, etc. Streaming data communication avoiding SerDe Curator/Zk Distributed Cache Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Distributed Cache Distributed Cache Distributed Cache
for metadata Typesafe HOCON for configuration and module management Netty4 as core RPC engine, protobuf for communication Vanilla Java, LArray and Netty ByteBuf for off-heap large data structures Hazelcast for distributed cache Netflix Curator on top of Zookeeper for service registry Optiq for SQL parsing and cost optimization Parquet (http://parquet.io)/ ORC Janino for expression compilation ASM for ByteCode manipulation Yammer Metrics for metrics Guava extensively Carrot HPC for primitive collections
logical plan Serving tree, CF, topology physical plan/optimizer Data sources &formats scanner API Sourc e Query Parser Logica l Plan Optimizer Physical Plan Execution
Demo+HowTo How to build/install Apache Drill on Ubuntu 13.04 http://www.confusedcoders.com/bigdata/apache- drill/how-to-build-apache-drill-on-ubuntu-13-04
Chen, Microsoft Chris Merrick, RJMetrics David Alves, UT Austin Sree Vaadi, SSS Srihari Srinivasan, ThoughtWorks Alexandre Beche, CERN Jason Altekruse, MapR http://incubator.apache.org/drill/team.html • Ben Becker, MapR • Jacques Nadeau, MapR • Ted Dunning, MapR • Keys Botzum, MapR • Jason Frantz • Ellen Friedman • Chris Wensel, Concurrent • Gera Shegalov, Oracle • Ryan Rawson, Ohm Data
(user | dev) http://incubator.apache.org/drill/mailing-lists.html Standing G+ hangouts every Tuesday at 18:00 CET http://j.mp/apache-drill-hangouts Keep an eye on http://drill-user.org/