Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Lambda Architecture in 10 minutes wi...

Building a Lambda Architecture in 10 minutes with BigQuery, CEP and Docker

Kazunori Sato

July 09, 2014
Tweet

More Decks by Kazunori Sato

Other Decks in Technology

Transcript

  1. +Kazunori Sato @kazunori_279 Solutions Architect, Cloud Platform GBU, Google Inc

    - GCP solutions design - Professional services for GCP - Docker/GCP meetups support
  2. “I want a real-time dashboard for my 200 web servers.”

    - a customer with 200 Google Compute Engine instances
  3. At Google, we have “big” big data everywhere What if

    a Googler is asked: “Can you give me the list of top 20 Android apps installed in 2012?”
  4. At Google, we run SQLs on Dremel = Google BigQuery

    SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012; ORDER BY count DESC It scans 68B rows in ~30 sec, No index used.
  5. select top(title), count(*) from publicdata:samples.wikipedia Massively Parallel Processing Scanning 1

    TB in 1 sec takes 5,000 disks Each query runs on thousands of servers
  6. Fast aggregation by tree structure Mixer 0 Mixer 1 Mixer

    1 Leaf Leaf Leaf Leaf Distributed Storage SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  7. BigQuery Streaming Low cost: $0.01 per 100,000 rows Real time

    availability of data 100,000 rows per second x tables
  8. Slideshare uses Fluentd for collecting logs from >500 servers. "We

    take full advantage of its extendable plugin architecture and use it as a message bus that collects data from hundreds of servers into multiple backend systems." Sylvain Kalache, Operations Engineer
  9. Why Fluentd? Because it’s super easy to use, and has

    extensive plugins written by active community.
  10. Norikra: an open source Complex Event Processing (CEP) Production use

    at LINE, the largest asian SNS with 400M users, for massive log analysis
  11. Lambda Architecture is: A complementary pair of: - in-memory real-time

    processing - large HDD/SSD batch processing Proposed by Nathan Marz ex. Twitter Summingbird Slow, but large and persistent. Fast, but small and volatile.
  12. A Recipe for a Lambda Architecture in 10 minutes Fluentd:

    event log collection from various event sources Norikra: scalable real time Complex Event Processing (CEP) BigQuery: scalable query engine for large datasets 1 2 3 Google Spreadsheet: flexible dashboard with a variety of charts Docker: repeatable deployment in 10 minutes 4 5
  13. Applications • Gaming: How many new users has purchased the

    first item in last 10 minutes? • Media: How many people hit the vote button during the live TV program? • Retail: What is the current total revenue of all stores nationwide? • Ads: What is the conversion rate of impressions/clicks to purchase? • Co-relate system resource usage with access/application logs • Real-time DoS or cheating detection • Send e-mail notification from Apps Script triggered by CEP query Real-time KPI Dashboard Real-time Monitoring and Alerting
  14. Real-time analytics by Norikra CEP with 10 sec latency Big

    data collection and analytics by BigQuery + Fluentd at ~1M rows/s Available on GitHub: GoogleCloudPlatform/lambda-dashboard Solution Benefits Real-time dashboard with Google Spreadsheet Deployable within 10 min with Docker