Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cloud Native Data Streaming Microservices with Spring Cloud and Kafka Marius Bogoevici, Pivotal @mariusbogoevici
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Who am I ? • Software Engineer with Pivotal – Project Lead, Spring Cloud Stream • Spring ecosystem contributor since 2008: – Spring Integration, Spring XD, Spring Integration Kafka, – Spring Cloud Stream, Spring Cloud Data Flow • Co-author, “Spring Integration in Action”, Manning, 2012
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Why microservices for data processing? • Cohesiveness around business capability (“do one thing and do it well”) • Organizational alignment (Conway’s Law), cross-team collaboration • Development agility • Optimized for replacement • Enable continuous delivery • Failure isolation • Granular resource tuning: • scaling out the critical parts of the pipeline • per-process: memory, CPU, network 3
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Event-driven (Messaging) Microservices • Solving communication complexity for data processing • Decoupling: • Physical: discovery • Temporal: availability • Eventual consistency vs. shared stores/distributed transactions • especially over heterogenous resources • Pub-sub makes it easy to add new elements to the topology
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Data and Event Streaming: Conceptually Similar Data Streaming: ingestion, analytics Async interaction, event sourcing
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 7 file jms http Kafka cassandra Solution : messaging microservices with Kafka
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ How about operational complexity? • Distributed systems are inherently complex, operating them even more so • Operational prerequisites: • Self-servicing and provisioning • elastic infrastructure • Monitoring • Rapid delivery • CI/CD, deployment pipeline • automation • “DevOps culture” https://martinfowler.com/bliki/MicroservicePrerequisites.html 9
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cloud native applications and platforms • Target a platform that makes running apps reliable, transparent and boring • In-built resource management • Memory, CPU, networking • Elastic scaling • Monitoring and failover • Health, logging, metrics • Routing and load balancing • Rolling upgrades 11 Apache YARN Apache Mesos Kubernetes
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 12 file jms http Kafka Platform cassandra count-words Cloud-native event-driven microservices with Kafka
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ The Monolith, the Platform and the Microservice(s) 13 Spring Cloud Stream 2015 Spring XD Spring Cloud Data Flow Spring Cloud Task
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Simple things should be simple; complex things should be possible. — Alan Kay
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Microservice development challenge: reducing the boilerplate code Monolith Boilerplate Business code In practice Microservices In theory
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Boot • Spring Framework for microservices • Simplified application structure: • Opinionated autoconfiguration based on application dependencies • Elliminate boilerplate, focus on business code • Externalized configuration • Environment, run-time arguments, system properties • Immutable artifact • Uberjar with nested dependencies • Executes from command line • Management and monitoring (JMX, HTTP) • Actuator endpoints: metrics, health, pause/shutdown
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Initializr: start.spring.io
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Trend of projects generated with Spring Initializr
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream in a nutshell
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Programming model @EnableBinding + Binder Implementation Apache Kafka JMS Google PubSub Production-ready: Experimental
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Programming model: individual message handling 27 @SpringBootApplication @EnableBinding(Processor.class) public class UppercaseProcessor { @StreamListener(“input”) @SendTo(“output”) public String process(String s) { return s.toUpperCase(); } }
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream + Kafka Streams • Overlapping paradigm: encapsulated input/output • Spring Cloud Stream + KStream • subscribes input KStreams to topics • connects the output KStream to topics • OOTB stateful processing support with KStream • Spring Cloud Stream content type negotiation • Or use Confluent Schema Registry directly • Underlying Spring Boot boot support • Flexible configuration: program arguments, environment variables • Actuator endpoints: health, metrics
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Programming model : Kafka Streams 29 @StreamListener(“input”) @SendTo(“output”) public KStream<String,Integer> wordCount(KStream<?,String> input) { return input.map((key, word) -> new KeyValue<>(word.toUppercase(), 1)) .groupByKey() .count(“Counts”) .toStream() }
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Stream + Kafka Streams input output Spring Cloud Stream KStream Processor words word-counts Spring Boot KStream API Spring Cloud Stream Programming model (developer focus) Application model (configuration options, StreamConfig based on Spring Boot properties, KStreamBuilder, KStream, lifecycle) Externalized configuration, uberjar construction, health monitoring endpoints (framework focus)
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Programming model : Kafka Streams (2) 31 @StreamListener @SendTo(“clicksImpressions”) public KStream<byte[], ClicksImpressions> join( @Input(“clicks”) KStream<byte[],Click> clicks, @Input(“impressions”) KStream<byte[],Impressions> users){ // join clicks and impressions } Functional programming model with multiple inputs and outputs
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Easy to orchestrate and deploy
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Simple topologies : (relatively) easy to deploy … 33 http HDFS spring.cloud.stream.bindings.output.destination=httphdfs spring.cloud.stream.bindings.input.destination=httphdfs spring.cloud.stream.bindings.input.group=httphdfs httphdfs.1
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ … but how about complex topologies ? 34 http raw-sensor-data averages top-n Calculator Failure detector averages averages HDFS HDFS HDFS
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Data Flow • Orchestration: • DSL for Stream topologies • REST API • Shell • UI • Portable Deployment SPI • OOTB apps for common integration use-cases 35
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Data Flow - Stream DSL 36 Stream definition Spring Boot Apps built with Spring Cloud Stream httpfile = http | file |
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Cloud Data Flow Deployment Platforms 37 Data Flow Server REST API Deployer SPI SCDF Flo SCDF Shell
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Deployment: Partitioning and Instance Count 39 http http work work work hdfs hdfs hdfs hdfs Load Balancer stream create s1 --definition “http | work | hdfs” stream deploy s1 --propertiesFile ingest.properties app.http.count=2 app.work.count=3 app.hdfs.count=4 app.http.producer.partitionKeyExpression=payload.id
Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Deployment: Resource Management 40 http http work work work app.work.spring.cloud.deployer.cloudfoundry.memory=2048
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Conclusions • Spring Cloud Stream, Spring Cloud Data Flow and Kafka are complementary • Kafka provides: • High-throughput, low latency messaging middleware (transport) • Stream processing engine via Kafka Streams • Spring Cloud Stream provides: • Spring Boot integration • Boilerplate reduction via opinionated application model • Spring Cloud Data Flow provides: • High-level orchestration for sophisticated topologies • Simplified deployment on a number of platforms: Cloud Foundry, Kubernetes, Mesos, Yarn
Inc. and licensed under a Creative Commons Attribution- NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Some links … http://cloud.spring.io/spring-cloud-stream http://cloud.spring.io/spring-cloud-dataflow https://github.com/mbogoevici/spring-cloud-stream-binder-kstream https://github.com/spring-cloud/spring-cloud-stream-samples