Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kafka, the hard parts
Search
Chris Keathley
January 10, 2019
Programming
3
2k
Kafka, the hard parts
This talk tries to summarize a lot of the lessons I've learned building systems on kafka.
Chris Keathley
January 10, 2019
Tweet
Share
More Decks by Chris Keathley
See All by Chris Keathley
Solid code isn't flexible
keathley
5
1.1k
Building Adaptive Systems
keathley
44
2.9k
Contracts for building reliable systems
keathley
6
1.1k
Building Resilient Elixir Systems
keathley
7
2.5k
Consistent, Distributed Elixir
keathley
6
1.7k
Telling stories with data visualization
keathley
1
690
Easing into continuous deployment
keathley
2
430
Leveling up your git skills
keathley
0
830
Generative Testing in Elixir
keathley
0
590
Other Decks in Programming
See All in Programming
API Platformを活用したPHPによる本格的なWeb API開発 / api-platform-book-intro
ttskch
1
110
PostgreSQL を使った快適な go test 環境を求めて
otakakot
0
390
オブザーバビリティ駆動開発って実際どうなの?
yohfee
3
670
登壇資料を作る時に意識していること #登壇資料_findy
konifar
5
2.1k
PJのドキュメントを全部Git管理にしたら、一番喜んだのはAIだった
nanaism
0
230
Go 1.26でのsliceのメモリアロケーション最適化 / Go 1.26 リリースパーティ #go126party
mazrean
1
330
日本だけで解禁されているアプリ起動の方法
ryunakayama
0
360
15年目のiOSアプリを1から作り直す技術
teakun
0
580
DSPy入門 Pythonで実現する自動プロンプト最適化 〜人手によるプロンプト調整からの卒業〜
seaturt1e
1
460
AI時代でも変わらない技術コミュニティの力~10年続く“ゆるい”つながりが生み出す価値
n_takehata
2
590
Raku Raku Notion 20260128
hareyakayuruyaka
0
430
メタプログラミングで実現する「コードを仕様にする」仕組み/nikkei-tech-talk43
nikkei_engineer_recruiting
0
150
Featured
See All Featured
Evolving SEO for Evolving Search Engines
ryanjones
0
140
Become a Pro
speakerdeck
PRO
31
5.8k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
130
Site-Speed That Sticks
csswizardry
13
1.1k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
96
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
63
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
180
sira's awesome portfolio website redesign presentation
elsirapls
0
170
Context Engineering - Making Every Token Count
addyosmani
9
730
How STYLIGHT went responsive
nonsquared
100
6k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
140
WCS-LA-2024
lcolladotor
0
470
Transcript
Kafka The Hard Parts Chris Keathley / @ChrisKeathley / keathley.io
Kafka is great
Kafka is just a log
https://flic.kr/p/9aXr88
https://flic.kr/p/9aXr88 Kafka
Kafka https://flic.kr/p/9aXr88 (metaphor)
Log aggregation Analytics and activity tracking Queuing ETL Messaging Stream
Processing Kafka Uses
Event Sourcing
Log aggregation Analytics and activity tracking Queuing ETL Messaging Stream
Processing Kafka Uses
https://flic.kr/p/hrrbVx
https://flic.kr/p/hrrbVx (still a metaphor) Kafka
Large consequences for failure
Joke about mr. glass
Joke about mr. glass
Iteration Is Hard
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Topic
Topic
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Written to the File system
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Messages are ordered
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5
Consumer Consumer
Topic
Topic Topic Topic Topic Broker
Broker Broker Broker
None
Replication Leader
Clients Java Client librdkafka
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Order is important User Events
Order is important User Events
Order is important Follow
Order is important Follow
Order is important Follow Message
Order is important Follow Message Unfollow
Order is important Follow Message Unfollow Causal
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Order is important Follow Message Unfollow Consumer
Group records based on order
Partitioner to_int(hash(key)) % partitions
Partitioner to_int(hash(user_id)) % partitions
Follow Message Unfollow Grouping Consumers
Follow Message Unfollow Causal Grouping Consumers
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers Follow Processor Message Processor
Follow Message Unfollow Grouping Consumers User event processor
Follow Message Unfollow Grouping Consumers User event processor
Follow Message Unfollow Grouping Consumers User event processor
User Events Create pipelines User event processor Messages
User Events Create pipelines User event processor Messages Consumes
User Events Create pipelines User event processor Messages Consumes Produces
"Commander: Better Distributed Applications through CQRS and Event Sourcing" by
Bobby Calderwood https://youtu.be/B1-gS0oEtYc
The less dependence you can have between consumers the better
Random partitioning is best if you can avoid ordering
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Errors have the potential to wreck your day
Consumer Errors
Consumer Errors
Consumer Errors
Consumer Errors
Consumer Errors Blocking the head of the line
Consumer What should we do? Errors
Non-Blocking vs. Blocking
Non-Blocking vs. Blocking
Non-Blocking Errors Consumer 42 1337 “Robert’);drop table students;—”
Non-Blocking Errors Consumer 42 1337 “Robert’);drop table students;—” What do
we do?
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer Error Topic
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking Errors Consumer
Non-Blocking vs. Blocking
Non-Blocking vs. Blocking
Blocking Errors Database Consumer
Blocking Errors Database Consumer Process messages Store Information
Blocking Errors Database Consumer
Blocking Errors Database Consumer
Blocking Errors Database Consumer What do we do?
Blocking Errors Database Consumer Retry
Blocking Errors Database Consumer Send alerts
Skip non-blocking errors & Retry blocking errors
Design errors out of existence
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Delivery Guarantees
Computer A Communication is hard Computer B What time is
it?
Computer A Communication is hard Computer B
Computer A Communication is hard Computer B Did you get
it?
Computer A Communication is hard Computer B How about now?
Computer A Communication is hard Computer B Now?
0 <= 1 <= n Delivery At least once At
most once Impossible-ish
Consumers should *ALWAYS* assume “At Least Once”
The Joys of Functional Programming
None
You
You Functional Programming
Immutability and Idempotence
Immutability: An immutable object is an object whose state cannot
be modified after it is created.
Idempotence: …the property of certain operations in mathematics and computer
science whereby they can be applied multiple times without changing the result beyond the initial application.
Idempotence: Execute the same operation more than once but only
see the effect once.
Idempotent Operations
Counting comments comment comment comment increment 1
Counting comments comment comment comment increment 1
Counting comments comment comment comment increment 2
Counting comments comment comment comment increment 2
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 3 Some Error
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 3
Counting comments comment comment comment increment 4
Counting comments comment comment comment increment 4
Counting comments comment comment comment increment 5
Counting comments comment comment comment increment 5
Counting comments comment comment comment increment 6
Kafka Record { data: {}, type: “comment.created”, }
Kafka Record { data: {}, type: “comment.created”, msg_id: UUIDv4 }
Kafka Record { data: {}, type: “comment.created”, msg_id: UUIDv4 }
Used for managing idempotence
Counting comments comment comment comment increment 1
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2, 3)
Counting comments comment comment comment Set.add(id) id: 1 id: 2
id: 3 (1, 2, 3) Some Error
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments comment comment comment id: 1 id: 2 id:
3 Set.add(id) (1, 2, 3)
Counting comments (1, 2, 3)
Counting comments cardinality(1, 2, 3)
Counting comments cardinality(1, 2, 3) => 3
Idempotent Side-Effects
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3 What do we do if this fails?
smtp send_email Sending Emails email id: 1 email id: 2
email id: 3 Send at most once
smtp send_email Sending Emails email id: 1
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp id?(1)
Cache send_email Sending Emails email id: 1 smtp id?(1) If
id exists then skip it
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp add(1)
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp
Cache send_email Sending Emails email id: 1 smtp
send_email Sending Emails email id: 1
send_email Sending Emails email id: 1 If we see this
message again move it to an audit topic
send_email Sending Emails If we see this message again move
it to an audit topic email id: 1
send_email Sending Emails
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
User Events Teams User event processor Messages Notifications Notifications Notification
Sender
User Events Teams User event processor Messages Notifications Notifications Notification
Sender Teams
Data is the language of the system
{ msg_id: "8700635f-1802-417e-89e7-595ad3600104", type: "comment.created", data: { user_id: 1234, msg:
"This is a super fun conference!" } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads None of this tells you anything useful about your data
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What do we do when these things change?
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads What do we do when these things change?
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads Lets just use versions!
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads Lets just use versions! (spoiler: this isn’t great)
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: String, msg:
String }, meta: { version: 2 } } Data payloads
Data Versions Consumer v1 v1 v1 v1 v2
Data Versions Consumer v1 v1 v1 v1 v2 This consumer
needs to understand both versions
Data Versions Consumer v1 v1 v1 v1 v2 This team
needs to know to make these changes
Versioning is broken
(sem)Versioning is broken
Change Growth Breakage
Change Growth Breakage Never do this
Growing schemas should be the default
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What are these?
Dependent Types
{ msg_id: String, type: String, data: { user_id: Integer, msg:
String } } Data payloads What are these?
Norm
{ msg_id: String, type: String, data: { user_id: String, msg:
String } } Data payloads
UUID = string? & re_matches?(/^[0-9A-F]{8}-[0-9A-F] {4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i) ) CommentCreated = schema{
req :msg_id, UUID req :type, lit(“comment.created”) req :data, schema { req :user_id, integer? | UUID req :msg, string? } } Data payloads
json = {type: “comment.created”, msg: “Hello world”} Norm.decode(CommentEvent, json) =>
{:ok, data} Norm.decode(CommentEvent, {}) => {:error, errors} Norm.explain(CommentEvent, {}) => "In :msg_id, val: {} fails spec: required In :type, val: {} fails spec: required In :data, val: {} fails spec: required" Data payloads
Norm is built for extensibility
CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }
json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible
CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }
json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible This will still get passed through
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Property Based Testing
Property based testing Database Consumer
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1 Information should end up here
Property based testing Database Consumer id: 1 id: 2 id:
3 id: 1 Some combination of these messages causes a failure
Property based testing Database id: 1 id: 1 Consumer
Property based testing Database id: 1 id: 1 Looks like
we aren’t handling duplicates correctly Consumer
Property based testing Database id: 1 id: 1 Consumer
Property based testing Database Consumer id: 1 id: 1 Deterministically
fail this connection
Chaos Engineering
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
Monitoring vs. Observability
Monitoring: Figuring out that there’s a problem
Observability: Determining what the problem is.
Goal: Detect lagging or blocked consumers
Wisen
Wisen User Events User Consumer
metadata topic Wisen User Events Checkpoints its position in the
log to an offset topic User Consumer
Wisen metadata topic Wisen User Consumer User Events
Wisen metadata topic Wisen User Consumer User Events Compares farthest
offset from checkpoints over a time-window
Wisen user_consumer_errors Wisen User Consumer User Events
Wisen user_consumer_errors Wisen User Consumer User Events
Wisen user_consumer_errors Wisen User Consumer User Events Alert if we
see a rise in errors
Other useful metrics: Median and Tail latencies Internal buffers DB/Cache/RPC
latencies
OpenTracing
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
This has to be done up-front
Calculating partions messages in the system = arrival rate *
mean time in system
Calculating partions Desired throughput / measured throughput on one partition
=> partitions needed
Calculating partions partitions < 100 x brokers x replication factor
source: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
Increasing partitions is tricky if you rely on ordering
to_int(hash(user_id)) % partitions
to_int(hash(user_id)) % partitions Existing data is not reshuffled if partitions
are increased
Data is not forever.
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems
and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
CQRS & Event Sourcing
Don’t rush to democratize your data
Embrace data and design
Go forth and build awesome stuff!
Thanks Chris Keathley / @ChrisKeathley / keathley.io