Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Event streaming fundamentals with Apache Kafka
Search
Keith Resar
February 24, 2022
Technology
1
530
Event streaming fundamentals with Apache Kafka
Keith Resar
February 24, 2022
Tweet
Share
More Decks by Keith Resar
See All by Keith Resar
Real-Time Data Transformation by Example
keithresar
0
70
Exactly-Once Semantics and Transactions in Kafka
keithresar
0
170
Implementing Strangler pattern for microservices migrations
keithresar
0
410
Stream processing with ksqlDB and Apache Kafka
keithresar
1
440
How Nagios is leveraging Ansible Network Automation
keithresar
1
98
Automating Satellite Installation and Configuration With the Ansible Foreman Modules
keithresar
1
730
Writing your first Ansible operator for OpenShift
keithresar
1
250
Intro to CI/CD in GitLab and Anatomy of a Pipeline
keithresar
2
400
Ansible Ecosystem Future Directions
keithresar
0
180
Other Decks in Technology
See All in Technology
AI時代、1年目エンジニアの悩み
jin4
1
160
GitHub Issue Templates + Coding Agentで簡単みんなでIaC/Easy IaC for Everyone with GitHub Issue Templates + Coding Agent
aeonpeople
1
170
Agile Leadership Summit Keynote 2026
m_seki
1
290
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
1.5k
Mosaic AI Gatewayでコーディングエージェントを配るための運用Tips / JEDAI 2026 新春 Meetup! AIコーディング特集
genda
0
150
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
3
2.1k
システムのアラート調査をサポートするAI Agentの紹介/Introduction to an AI Agent for System Alert Investigation
taddy_919
2
1.7k
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
42k
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
150
~Everything as Codeを諦めない~ 後からCDK
mu7889yoon
3
260
GSIが複数キー対応したことで、俺達はいったい何が嬉しいのか?
smt7174
3
140
Featured
See All Featured
Amusing Abliteration
ianozsvald
0
95
The SEO Collaboration Effect
kristinabergwall1
0
350
Side Projects
sachag
455
43k
The SEO identity crisis: Don't let AI make you average
varn
0
64
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
97
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
580
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
1.9k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.7k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Six Lessons from altMBA
skipperchong
29
4.1k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
120
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
280
Transcript
Event Streaming Fundamentals with Apache Kafka Keith Resar Sr. Kafka
Developer @KeithResar
Data-Driven Operations
Data-Driven Operations
Data-Driven Operations
None
@KeithResar
@KeithResar
The Rise of Event Streaming 2010 Apache Kafka created at
LinkedIn 2022 Most fortune 100 companies trust and use Kafka
A company is built on _DATA FLOWS_ but all we
have are _DATA STORES_
Example Application Architecture Serving Layer (Microservices, Elastic, etc.) Java Apps
with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering @KeithResar
Apache Kafka is an Event Streaming Platform 1. Storage 2.
Pub / Sub 3. Processing @KeithResar
Storage 12 @KeithResar
Core Abstractions @KeithResar • DB → table • Hadoop →
file • Kafka - ?
LOG
Immutable Event Log New Messages are added at the end
of the log Old @KeithResar
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Messages Inside Topics Clicks Orders Customers Topics are similar to
database tables @KeithResar
Topics divide into Partitions Messages are guaranteed to be strictly
ordered within a partition @KeithResar P 0 Clicks P 1 P 2
None
Pub / Sub 20 @KeithResar
Producing Data New Messages are added at the end of
the log Old @KeithResar
Consuming Data New Consume via sequential data access starting from
a specific offset. Old @KeithResar Read to offset & scan
Distinct Consumer Positions New Old @KeithResar Sally offset 12 Fred
offset 3 Rick offset 9
None
Messages are KV Bytes key: byte[] value: byte[] Headers =>
[Header] @KeithResar
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - No Key @KeithResar P 0 P
1 P 2 P 3 Messages will be produced in a round robin fashion
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Producing to Kafka - With Key @KeithResar P 0 P
1 P 2 P 3 hash(key) % numPartitions = N
Consumer from Kafka - Single @KeithResar P 0 P 1
P 2 P 3 Single consumer reads from all partitions
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
CONSUMER GROUP COORDINATOR CONSUMERS CONSUMER GROUP
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Consumer from Kafka - Multiple @KeithResar P 0 P 1
P 2 P 3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation
Grouped Consumers @KeithResar P 0 P 1 P 2 P
3 Consumers can be split into multiple groups each of which operate in isolation X
None
Linearly Scalable Architecture @KeithResar Producers • Many producers machines •
Many consumer machines • Many Broker machines Consumers Single topic, No Bottleneck!
Replicate for Fault Tolerance @KeithResar Broker A Broker B Message
✓ Leader Replicate
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Replication Provides Resiliency @KeithResar Producers Consumers Replica followers become leaders
on machine failure X X X X X
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Partition 1 Partition 2 Partition 3 Follower Leader Partition 2 Partition 1 Partition 3
Partition Leadership / Replication @KeithResar Broker 1 Broker 2 Broker
3 Broker 4 P 0 P 1 P 2 P 3 Partition 0 Partition 2 Partition 3 Partition 0 Partition 1 Partition 3 Partition 0 Partition 1 Partition 2 Follower Leader Partition 2 Partition 1 Partition 3
None
The log is a type of durable messaging system @KeithResar
Similar to a traditional messaging system (ActiveMQ, Rabbit, etc.) but with: • Far better scalability • Built-in fault tolerance/HA • Storage
None
Origins in Stream Processing Serving Layer (Microservices, Elastic, etc.) Java
Apps with Kafka Streams or ksqlDB Continuous Computation High-Throughput Event Streaming Platform API-Based Clustering
Processing 51 @KeithResar
Streaming is the toolset for working with events as they
move! @KeithResar
What is stream processing? @KeithResar auth attempts possible fraud
What is stream processing? @KeithResar User Population Coding Sophistication Core
developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts streams
Standing on the Shoulders of Streaming Giants Producer, Consumer APIs
Kafka Streams ksqlDB Ease of use Flexibility ksqlDB UDFs Powered by Powered by
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
What is stream processing? @KeithResar CREATE STREAM possible_fraud AS SELECT
card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3;
None
Wrap Up 64 @KeithResar
developer.confluent.io Learn Kafka. Start building with Apache Kafka at Confluent
Developer.
Free eBooks Designing Event-Driven Systems Ben Stopford Kafka: The Definitive
Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps http://cnfl.io/book-bundle
None
Thank You @KeithResar Kafka Developer confluent.io