and other systems built to store data, but what was missing in our architecture was something that would help us to handle continuous flows of data.” – jay kreps Origins of apache kafka
KAFKA AS A STREAMING PLATFORM: A SYSTEM THAT LETS YOU PUBLISH AND SUBSCRIBE TO STREAMS OF DATA, STORE THEM, AND PROCESS THEM, AND THAT IS EXACTLY WHAT APACHE KAFKA IS BUILT TO BE.” – jay kreps Origins of apache kafka
Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)
Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Distributed Commit log
three areas – to bring all the streams of data together across all the use cases – is what makes the idea of a streaming platform so appealing to people” – jay kreps Origins of apache kafka
consumer broker 1) pull 3) write What IF WE COULD HAVE A Processing LAYER FOR THE DATA STREAMS? number of records < 4 12 number of records > 5 9 2) process
Optimized for massive reads Broker 1 250gb 250gb 500gb 1tb storage pagecache nic consumer Kafka uses the sendfile api to: - Bypass pagecache to kernel space - Bypass kernel space to user buffer - Bypass user buffer to kernel space - Bypass kernel space to socket buffer Partition 1 Partition 2