Upgrade to Pro — share decks privately, control downloads, hide ads and more …

INTERFACE by apidays 2023 - Leveraging Event St...

INTERFACE by apidays 2023 - Leveraging Event Streaming to Super-Charge your Business Solutions, Mary Grygleski, DataStax

INTERFACE by apidays 2023
APIs for a “Smart” economy. Embedding AI to deliver Smart APIs and turn into an exponential organization
June 28 & 29, 2023

Leveraging Event Streaming to Super-Charge your Business Solutions
Mary Grygleski, Senior Streaming Developer Advocate at DataStax

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

July 11, 2023
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. ©2023 DataStax. – All rights reserved
 Leveraging Event Streaming to

    Supercharge your Business Solutions 1 Mary Grygleski Streaming Developer Advocate @mgrygles June 2023
  2. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 ➢ Streaming ➢ Distributed Systems ➢ Reactive Systems ➢ IoT/MQTT ➢ Real-Time AI/ML mgrygles mary-grygleski mgrygles Streaming Developer Advocate Java Champion Passionate Advocate mgrygles
  3. ©2023 DataStax. – All rights reserved
 Basics: The Many Facets

    of Computing Events • Event Streaming • Event Processing • Complex Event Processing • Event-Driven vs Message-Driven • Event Messaging Semantics / Patterns • Pub/Sub • Queue Event Streaming: Use Cases An Intro to Apache Pulsar • How is Pulsar Different? • Highlighting a few Developer Features • StarLight API - Kafka, RabbitMQ, JMS DataStax Managed Cloud: Astra Streaming Resources AGENDA
  4. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 01 Basics: The Many Facets of Computing Events 5

  5. ©2023 DataStax. – All rights reserved
 What is an event,

    generically speaking? From Merriam-Webster.com: 1 a :something just happens: OCCURRENCE … 4 : the fundamental entity of observed physical reality represented by a point designated by three coordinates of place and one of time in the space-time continuum postulated by the theory of relativity
  6. ©2023 DataStax. – All rights reserved
 The alphabet soup of

    event computing Event Sourcing Event-Driven Architecture Event Streaming Serverless Apps Stateless Microservices Reactive Systems Event Messaging Event Processing Event Storming Stateful Microservices Data data data data data data data data
  7. ©2023 DataStax. – All rights reserved
 Event Streaming Processing The

    practice of taking action on a series of data points that originate from a system that continuously creates data. “Event” refers to each data point in the system, “Stream” refers to the ongoing delivery of those events. A series of events can also be referred to as “streaming data” or “data streams.”
  8. ©2023 DataStax. – All rights reserved
 Complex Event Processing Complex

    event processing (CEP) is a set of techniques for capturing and analyzing streams of data as they arrive to identify opportunities or threats in real time. CEP enables systems and applications to respond to events, trends, and patterns in the data as they happen.
  9. ©2023 DataStax. – All rights reserved
 Event-Driven vs Message-Driven Messaging

    Event-Driven -> Sender emits messages and interested subscribers can subscribe to the messages Message-Driven -> Sender and Receiver are known to each other (address is known)
  10. ©2023 DataStax. – All rights reserved
 Streaming: Pub/Sub - Publishing

    Client sends the data to the topic - Broker is the middle-person / agent, owns the topic - Subscribing Client receives the data from the topic Producer Producer Producer Topic 1 Topic 2 Subscriber Subscriber
  11. ©2023 DataStax. – All rights reserved
 Message Queueing A message

    queue is a form of asynchronous service-to-service communication used in serverless and microservices architectures. Messages are stored on the queue until they are processed and deleted. Each message is processed only once, by a single consumer. Message queues can be used to decouple heavyweight processing, to buffer or batch work, and to smooth spiky workloads. Message 1 Message 2 Message 3 Queue Receiver Sender Message gets picked up once and is gone after that
  12. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 02 14
 Event Streaming: Use Cases
  13. ©2023 DataStax. – All rights reserved
 15
 Real-time Data Applications

    Digital Experiences Use real time data to enhance customer experiences and create a competitive advantage for your business. Data Science Use data pipelines to build AI/ML and smart models from time series event streams. Edge Computing Scale to meet the demands of large volumes of data generated by application operating at the edge.
  14. 16

  15. ©2023 DataStax. – All rights reserved
 Why event streaming •

    Watch for events with “the system” or application • Subscribe to topics to see certain event types • Make decisions on data in real time. Not after the event. • Ingest high frequency of messages with very low latency 17
  16. ©2023 DataStax. – All rights reserved
 18 Streaming Ingest data

    Sink data Select data Process data Not Streaming Ingest data Persist data Select data Process data Streaming versus not streaming Persist data Select data
  17. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 03 19
 Introduction to Apache Pulsar
  18. ©2023 DataStax. – All rights reserved
 Meet Pulsar 20 Open

    source Created by Yahoo Contributed to the Apache Software Foundation (ASF) in 2016 Top-level project (2018) Cloud-native design Cluster based Multi-tenant Simple client APIs (Java, C#, Python, Go, …) ➔ Separate compute and storage! Guaranteed message delivery If a message successfully reaches a Pulsar broker, it will be delivered to its intended target. Light-weight serverless functions framework Create complex processing logic within a Pulsar cluster (aka: data pipeline) Tiered storage offloads Offload data from hot/warm storage to cold/long-term storage when the data is aging out
  19. ©2023 DataStax. – All rights reserved
 What is Apache Pulsar

    • Unified, distributed messaging and streaming platform • Open source ◦ Originally developed at Yahoo! ◦ Contributed to the Apache Software Foundation (ASF) in 2016 ◦ Top-level project (2018) • Cloud Native ◦ K8s ◦ Multi-cloud and hybrid-cloud Four Reasons Why Apache Pulsar is Essential to the Modern Data Stack 21
  20. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 How is Pulsar Different ? 24

  21. ©2023 DataStax. – All rights reserved
 Pulsar Components 25 Producer

    Client application sending messages to topic managed by Broker Consumer Client application reading messages from a topic managed by Broker BookKeeper Persistent message store ZooKeeper Holds cluster metadata, handles coordination tasks between Pulsar clusters Broker A stateless process that handles incoming message, message dispatching, communicates with the Pulsar configuration store, and stores messages in BookKeeper instances
  22. ©2023 DataStax. – All rights reserved
 How is Puslar Different?

    Pulsar’s next generation architecture provides Distributed, tiered architecture Separates compute from storage Zookeeper holds metadata for the cluster Stateless Broker handles producers and consumers Storage is handled by Apache Bookkeeper Producer Consumer Broker Serving Zookeeper Bookie Storage Bookie Storage Bookie Storage Broker Serving Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3
  23. ©2023 DataStax. – All rights reserved
 Unified Solution for •

    Pub/Sub • Queuing • Streaming • Message mediation & enrichment Out of the Box Capabilities Include • Cloud, on-prem & hybrid • Geo-replication • Multi-region support • Data lake integration • And much, much more... Apache Pulsar represents the Next Generation of Enterprise Messaging Apache Pulsar Solves the Problems of Bolt-on 27 27
  24. ©2023 DataStax. – All rights reserved
 Apache Pulsar as an

    Unified Platform 28 • Unified infrastructure with built-in geo-replication ◦ Multi-cloud, hybrid-cloud, multi-region • Unified enterprise messaging/streaming backbone ◦ Wire-level messaging protocol compatibility
  25. ©2023 DataStax. – All rights reserved
 Key Differentiator 1: Separation

    between Compute and Storage • Distributed, tiered architecture ◦ Separates compute from storage ◦ Independent scaling • Stateless Broker handles producers and consumers ◦ Intelligent, automatic load balancing • Storage is handled by Apache BookKeeper ◦ Segment-centric message storage management • Fast and Low Impact Horizontal Scaling Capability 29
  26. ©2023 DataStax. – All rights reserved
 Key Differentiator 2: Native

    Geo-Replication • Hands off, real time message replication across data centers • Flexible message replication mode and patterns ◦ Synchronous vs Asynchronous ◦ Active-Active, Active-Passive ◦ Selective message replication • Capabilities to meet Data Compliance requirements across geo-regions 30
  27. ©2023 DataStax. – All rights reserved
 Key Differentiator 3 :

    Multi-Tenancy • Consolidated messaging/streaming platform ◦ Operation simplicity • Effective permission control within business domain context ◦ Security, compliance, auditing • Better IT resource utilization. Reduce Total Cost of Ownership (TCO) ◦ Storage Quota ◦ Message flow control and throttling mechanisms ◦ Physically separate brokers and/or bookies for tenants 31
  28. ©2023 DataStax. – All rights reserved
 Key Differentiator 4 :

    Flexible Message Processing Model • Out-of-the box multi-subscription modes ◦ Exclusive ◦ Failover ◦ Shared ◦ Key_Shared • Good fit with Queuing use case as well ◦ Kafka has challenges for this 32
  29. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 Highlighting A Few Developer Features 33

  30. ©2023 DataStax. – All rights reserved
 34
 “Function” is there

    to transform the data in the most efficient way! Data Pipeline Illustrated
  31. Serverless function platform purpose-built for streaming data pipelines.
 Simple Function

    Architecture
 Triggered from input topic
 Simple programmatic interface
 Push function result to output topic
 Built for DevOps
 Standard Kubernetes based runtime
 Automated deployments
 CI/CD friendly
 Pulsar Functions
  32. ©2023 DataStax. – All rights reserved
 Pulsar Schema 36
 ❖

    If you do not want to worry about serializing and deserializing your data transfer over the wire ❖ Built-in schema registry ❖ Schema type ➢ Primitive ➢ Key/value pair ➢ Avro, JSON, Protobuf ❖ Schema evolution ➢ Version ➢ Compatibility ❖ Schema Management ➢ Automatic ➢ Manual 

  33. DataStax Starlight: 
 Protocol Level Compatibility for Pulsar
 39 Drop

    in replacement for existing messaging and streaming platforms. Existing skill sets remain applicable Eliminates interoperability challenges between various messaging platforms Extensive testing to ensure full compatibility at a specification and feature set level. **MQTT, ActiveMQ, RocketMQ coming soon
  34. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 04 DataStax Managed Cloud - Astra
  35. ©2023 DataStax. – All rights reserved
 41 Managed Pulsar Astra

    Streaming Enterprise Support Pulsar Luna Streaming Community Driven Pulsar Open Source Pulsar meets you where you are
  36. ©2023 DataStax. – All rights reserved
 Why use Astra Steaming

    42
 Secure Enterprise Communication
 Security
 Threat/Fraud Detection
 IoT
 Personalization
 Manufacturing
 Financial Services
 Event Driven Architecture

  37. 43 Pulsar Native 01 Starlight for RabbitMQ 02 Starlight for

    Kafka 03 Starlight for JMS 04 Pulsar as -the- messaging substrate
  38. ©2023 DataStax. – All rights reserved
 Architecture Advantage of Pulsar

    • Compute and Storage Separation ◦ Stateless brokers ◦ Independent scalability ◦ Instantaneous broker scaling and disaster recovery 46 • Segment-Oriented Log Management ◦ Segment (of a Partition) as the smallest replication unit ◦ Efficient storage utilization; Unbounded partition storage ◦ Truly horizontal scalability ◦ Fast and low impact scaling and disaster recovery
  39. ©2023 DataStax. – All rights reserved
 Quick Demo Sample code:

    https://github.com/mgrygles-lab/PulsarClientTestOne
  40. ©2023 DataStax. – All rights reserved
 Resources - Apache Pulsar

    and Astra from DataStax https://astra.datastax.com https://www.datastax.com/products/astra-streaming https://www.datastax.com/products/luna-streaming Documentation for Streaming: https://docs.datastax.com/en/streaming/streaming/index.html CDC for Astra: https://docs.datastax.com/en/astra/docs/astream-cdc.html 
 https://pulsar.apache.org/ https://bookkeeper.apache.org/ https://zookeeper.apache.org 49

  41. ©2023 DataStax. – All rights reserved
 Community Resources - Apache

    Pulsar 50
 Community Info
 Apache Pulsar Community Info (Slack, Mailing Lists, StackOverflow, WeChat): http://pulsar.apache.org/en/contact/ 
 Pulsar Slack (how to sign up): https://apache-pulsar.herokuapp.com/
 
 Source Code
 Apache Pulsar: https://github.com/apache/pulsar 
 DataStax on GitHub: https://github.com/datastax
 Starlight for Kafka (from DataStax) - https://github.com/datastax/starlight-for-kafka
 Starlight for JMS (from DataStax) - https://github.com/datastax/pulsar-jms / https://www.datastax.com/starlight/jms
 Starlight for RabbitMQ (from DataStax) - https://github.com/datastax/starlight-for-rabbitmq
 

  42. ©2023 DataStax. – All rights reserved
 Follow Mary’s Twitch Stream

    (Different topics: Java, Open Source, Distributed Messaging, Event-Streaming, Cloud, DevOps, etc) Wednesday at 2pm-US/CST https://twitch.tv/mgrygles
  43. ©2023 DataStax. – All rights reserved
 How to start coding

    all of this? Check out Awesome-Astra https://awesome-astra.github.io/docs/
  44. ©2023 DataStax. – All rights reserved
 Benchmark report: Kafka vs

    Pulsar (from StreamNative) https://streamnative.io/pulsar/pulsar-vs-kafka
  45. ©2023 DataStax. – All rights reserved
 Thank You Mary Grygleski

    https://www.linkedin.com/in/mary-grygleski/ @mgrygles https://discord.gg/RMU4Juw