Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unified Data Analytics Platform (with Zeppelin,...

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Apache Zeppelin Meetup (2016): http://bit.ly/2yO5ynW
Unified
Data Analytics Platform
(with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Christian Tzolov

January 21, 2016
Tweet

More Decks by Christian Tzolov

Other Decks in Technology

Transcript

  1. Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

    Apache Committer, Crunch PMC member [email protected] blog.tzolov.net @christzolov
  2. Contents • DEMO • Zeppelin Interpreters • PSQL (to became

    JDBC in 0.6.x) • Geode • SpringXD • Apache Ambari • Zeppelin Service • Geode, HAWQ and Spring XD services • Webpage Embedder View
  3. Technical Stack Apache HDFS Data Lake - PHD or HDP

    Hadoop Apache HAWQ SQL on Hadoop (OLAP) Apache Geode In-memory data grid (OLTP) Spring XD Integration and Streaming Runtime Apache Ambari Manages All Clusters Apache Zeppelin Web UI for interaction with Data Systems Hadoop/HDFS Geode HAWQ SpringXD Ambari Zeppelin
  4. Spring XD Orchestrates and automates all steps across multiple data

    stream pipelines • HTTP • Tail • File • Mail • Twitter • Gemfire • Syslog • TCP • UDP • JMS • RabbitMQ • MQTT • Kafka • Reactor TCP/UDP • Filter • Transformer • Object-to-JSON • JSON-to-Tuple • Splitter • Aggregator • HTTP Client • Groovy Scripts • Java Code • JPMML Evaluator • Spark Streaming • File • HDFS • JDBC • TCP • Log • Mail • RabbitMQ • Gemfire • Splunk • MQTT • Kafka • Dynamic Router • Counters
  5. Apache Geode • Cache - Performance / Consistency / Resiliency

    • Region - Highly available, redundant, distributed Map China Railway Corporation 5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second Indian Railways 7,000 stations 72,000 miles of track 23 million passengers daily 120,000 concurrent users 10,000 transactions per minute
  6. Apache HAWQ • Built around a Greenplum MPP DB •

    100% ANSI SQL compliant: SQL-92/99/2003… • ODBC and JDBC • Hadoop Native: Parquet, HDFS and YARN • Extensible - Web Tables, PXF • TPC-DS outperforms Impala by overall 454%
  7. Demo tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets geodeTap =

    tap:stream:tweets > gemfire-json-server --regionName=regionTweet hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter
  8. SpringXD Interpreter(s) • %xd.stream and %xd.job • Multiple streams or

    jobs in a paragraph. • Special Deploy/Launch Semantics • Zeppelin Dynamic Forms (${…}) • Comprihensive Stream and Job DSL auto- completion (Ctrl+.)
  9. PSQL Interpreter • Prefix: %psql.sql • PostgreSQL, HAWQ/PXF, Greenplum …

    JDBC • PSQL command line shell (via %sh) • Zeppelin Dynamic Forms (${…}) • Comprihensive SQL/JDBC autocompletion (Ctrl+.)
  10. Geode Interpreter • Prefix: %geode.oql • OQL and PDX nested

    access (user.name) • Geode command line shell (via %sh) • Zeppelin Dynamic Forms (${…}) • Basic OQL auto-completion (Ctrl+.)
  11. Ambari Services • Ambari Zeppelin Service: github , rpm, blog

    • Ambari Geode Service: github, rpm • Ambari SpringXD Service: github • Ambari HAWQ Service (Pivotal BDS dist)