Professor @ ITBA - Especialización en Ciencia de Datos • Working in Data Projects since 2010 at ITBA, Globant (Google), Despegar, Socialmetrix, Jampp, Claro, etc. • Co Founder @ Mutt Data a company specialized in developing projects using Big Data and Data Science. • Developed first production ready stream processing system in 2015 @juanpampliega | [email protected]
must become event driven In Event-Command pattern (like REST applications) the endpoint is known, the method being called is also known and lastly the calls tended to return a value. In Event-Driven pattern services communicate only by generating events that can be reused by any service in the system which leads to less coupling.
Batch processing => bounded datasets Stream processing => unbounded datasets Stream processing means computing on data directly as it is produced or received.
compute and update results with each new event. Modern applications and microservices should operate in an event-driven fashion. Their logic and computation is triggered by events. Unified event-driven applications and real-time analytics
stream processing in which the computation maintains contextual state. This state is used to store information derived from the previously-seen events.
in stream processing. It can be indexed and accessed in a variety of rich ways. Local, in-process data access is much faster. It’s easier to isolate. Implemented with in memory hash table, bloom filters, bit maps, RocksDB like systems, etc.
internal streaming data processing systems @ Google) defined 4 critical questions any stream processing system should be able to answer: What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate?
Beam Model: Pipeline Construction Other Languages Beam Java Beam Python Execution Execution Cloud Dataflow Execution 1. The Beam Model: What / Where / When / How 2. SDKs for writing Beam pipelines -- starting with Java 3. Runners for Existing Distributed Processing Backends ◦ Apache Apex ◦ Apache Flink ◦ Apache Spark ◦ Google Cloud Dataflow ◦ Local (in-process) runner for testing
Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and- services/ Tyler Akidau, Streaming 101: The world beyond batch https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 Tyler Akidau, Streaming 102: The world beyond batch https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 Ververica, What is Stream Processing? https://www.ververica.com/what-is-stream-processing Apache Flink Documentation https://ci.apache.org/projects/flink/flink-docs-release-1.8/ Apache Beam https://beam.apache.org/