Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[OracleCode NYC-2018] Rethinking Stream Process...

[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL

Viktor Gamov

March 08, 2018
Tweet

More Decks by Viktor Gamov

Other Decks in Programming

Transcript

  1. @ @gamussa @confluentinc Solutions Architect Developer Advocate @gamussa in internetz

    Hey you, yes, you, go follow me in twitter © Who am I?
  2. @gamussa @confluentinc 5 1.0 Enterprise
 Ready 0.10 Data Processing (Streams

    API) 0.11 Exactly-once
 Semantics Kafka the Streaming Data Platform 2013 2014 2015 2016 2017 2018 0.8 Intra-cluster replication 0.9 Data Integration (Connect API)
  3. @gamussa @confluentinc 7 We want our apps to be: Scalable

    Elastic Fault-tolerant Stateful Distributed
  4. @gamussa @confluentinc 11 the KAFKA STREAMS API is a 


    JAVA API to 
 BUILD REAL-TIME APPLICATIONS to 
 POWER THE BUSINESS
  5. 13 Brokers? Nope! App Streams API App Streams API App

    Streams API Same app, many instances
  6. @gamussa @confluentinc 16 this means you can 
 DEPLOY your

    app ANYWHERE using WHATEVER TECHNOLOGY YOU WANT
  7. @gamussa @confluentinc 17 Things Kafka Streams Does Runs everywhere Clustering

    done for you Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes
  8. @gamussa @confluentinc 26 // Example: reading data from Kafka KStream<byte[],

    String> textLines = builder.stream("textlines-topic", Consumed.with( Serdes.ByteArray(), Serdes.String())); // Example: transforming data KStream<byte[], String> upperCasedLines= rawRatings.mapValues(String::toUpperCase)); KStream
  9. @gamussa @confluentinc 27 // Example: aggregating data KTable<String, Long> wordCounts

    = textLines
 .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\ \W+")))
 .groupBy((key, word) -> word)
 .count(); KTable
  10. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing

    Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  11. KSQL for Data Exploration SELECT status, bytes FROM clickstream WHERE

    user_agent = ‘Mozilla/5.0 (compatible; MSIE 6.0)’; An easy way to inspect data in a running cluster
  12. KSQL for Streaming ETL •Kafka is popular for data pipelines.

    •KSQL enables easy transformations of data within the pipe. •Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum';
  13. KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number,

    count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  14. KSQL for Real-Time Monitoring • Log data monitoring, tracking and

    alerting • Sensor / IoT data CREATE TABLE error_counts AS 
 SELECT error_code, count(*) 
 FROM monitoring_stream 
 WINDOW TUMBLING (SIZE 1 MINUTE) 
 WHERE type = 'ERROR' 
 GROUP BY error_code;
  15. KSQL for Data Transformation CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’,

    TIMESTAMP=‘view_time’) AS 
 SELECT * FROM clickstream PARTITION BY user_id; Make simple derivations of existing topics from the command line
  16. Where is KSQL not such a great fit? BI reports

    (Tableau etc.) •No indexes •No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries •Limited span of time usually retained in Kafka •No indexes
  17. CREATE STREAM clickstream ( time BIGINT, url VARCHAR, status INTEGER,

    bytes INTEGER, userid VARCHAR, agent VARCHAR) WITH ( value_format = ‘JSON’, kafka_topic=‘my_clickstream_topic’ ); Creating a Stream
  18. CREATE TABLE users ( user_id INTEGER, registered_at LONG, username VARCHAR,

    name VARCHAR, city VARCHAR, level VARCHAR) WITH ( key = ‘user_id', kafka_topic=‘clickstream_users’, value_format=‘JSON'); Creating a Table
  19. CREATE STREAM vip_actions AS SELECT userid, fullname, url, status 


    FROM clickstream c 
 LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; Joins for Enrichment
  20. Trade-Offs • subscribe() • poll() • send() • flush() Consumer,

    Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  21. How to run KSQL JVM KSQL Server KSQL CLI JVM

    KSQL Server JVM KSQL Server Kafka Cluster #2 CLIENT-SERVER
  22. How to run KSQL Kafka Cluster JVM KSQL Server JVM

    KSQL Server JVM KSQL Server #3 AS A STANDALONE APPLICATION