Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In progress v1
Search
benstopford
March 02, 2016
1
40
In progress v1
benstopford
March 02, 2016
Tweet
Share
More Decks by benstopford
See All by benstopford
Saxo Events, Microservices and Data
benstopford
1
30
Trends in Event Streaming
benstopford
0
160
Event Sourcing, Stream Processing and Serverless (Kafka Summit SF19)
benstopford
2
330
Event Streaming Fundamentals
benstopford
0
53
Event Sourcing, Stream Processing & Serverless
benstopford
1
450
Streams vs. Serverless: Friend or Foe?
benstopford
0
76
Event Streaming and the Future of Applications
benstopford
0
80
Event Streaming with Microservices
benstopford
0
87
The Future of Applications is Streaming (40mins)
benstopford
0
130
Featured
See All Featured
Teambox: Starting and Learning
jrom
133
8.8k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
1.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.2k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
No one is an island. Learnings from fostering a developers community.
thoeni
19
3k
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Automating Front-end Workflow
addyosmani
1366
200k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
Testing 201, or: Great Expectations
jmmastey
38
7.1k
Transcript
Microservices in a Streaming World
None
Centralisation
Introversion
decentralisation Warehouse Analytics
Systems are better Connected
Which is good! duplication inefficiency error
Embrace the wider context
Many Architectural Styles
Shared State polling a central oracle
Shared State with Transport Services talk through broker, but share
a database
Mediator / Workflow single workflow walks a variety of services/workers
through a set of steps
Event Driven Architecture Services listen for interesting events, do something
useful, announce their results, the process continues
SOA / Microservices Message Broker
Pros and Cons
combinations
combinations Withdraw £100 Account Service General Ledger Customer Statements Fraud
Detection Check Funds Async Message Broker
Services generally eschew shared state
Distributed Programming is Hard
?? V.S.
How do we put these things together?
Synchronous Communication
Request/Response Request Response
Request/Response + Registry Registry Request Response
Request Response Withdraw £100 Account Service Check Funds
Asynchronous Communication
Queues
Tools: Queue Producer Consumer
Async / Decoupled
Point to Point Service A Service B
Scaling Consumption Instance 2 Instance 1 Single message allocation has
scalability issues
Batched Allocation Instance 1 Instance 2
Lose Ordering Guarantees Fail! Instance 1 Instance 2
Topics
Topics Retain Ordering Trades Buys Sells Broker Instance 1 Instance
2
Even when services fail Trades Buys Sells Fail! Broker We
retain ordering, but we have to detect & reprovision Instance 1 Instance 2
Filters don’t scale well Lots of consumers => Lots of
filters Slow consumers Broker => Queues are Databases* *Queues Are Databases - Jim Gray 1995 => Lots of indexes => Random disk access
Three Implications
Multi-Consumer Queues Lose Ordering Guarantees Fail! Worker 1 Worker 2
Trades Buys Sells Topics don’t degrade well Broker
Trades Buys Sells Messages are Transient Broker
Is there another way?
Think back to the queue example Batch Batch
Alternative: Shard on the way in
Each shard is a queue Strong Ordering (in shard). Good
concurrency.
Each consuming service is assigned a “personal set” of queues
each little queue is sent to only one consumer in a group
This means services can naturally rebalance Service instance dies Data
is redirected
Sharded In, Sharded Out
Reduces to a globally ordered queue
This is a Distributed Log Kafka is one example
some other nice properties
Fault Tolerance
The Log Single seek & scan Append only messages don’t
need to be transient!
Linearly Scalable
• Strong partition-based ordering • Scalable data retention • Scalable
multiprocessing • Always on • Dynamic Consumer Rebalancing
So how is this useful to microservices?
Build ‘Always On’ Services Rely on Fault Tolerant Broker
Load Balance Services Load Balance Services (with strong ordering)
Fault Tolerant Services Services automatically fail over
Services can retain data in the log Rewind & Replay
Retain data in “table-like” compacted streams Older K1 K1 K1
K2 K2 K2 K1 V1 V1 V2 V3 V2 V4 V3 K1 V4 K2 V3 Log file Exchange Rates Exchange Rates Keep all versions of key Only keep latest key Just a simple journal Compacted Stream
What does that mean exactly? In Kafka it’s termed Log
Compaction
Exchange Rate Service Exchange Rate Service USD/GBP = 0.71 EUR/GBP
= 0.77 USD/INR = 67.7 USD/AUD = 1.38 EUR/JPY = 114.41 …
Option1: Request Response rate for USD/GBP? 0.71 Exchange Rate Service
Option 2: Publish Subscribe Exchange Rate Service Accumulate current state
ETL
Option 3: Accumulate in Compacted Stream Exchange Rate Service Publish
all rate events Broker keeps latest version of exchange rates in a compacted stream Get all exchange rates New exchange rates
Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered
… adding in stream processing
Max(price) From orders where ccy=‘GBP’ over 1 day window emitting
every second What is stream processing? continuous queries.
What is stream processing engine? Data Index Query Engine Query
Engine vs Database Finite, well defined source Stream Processor Infinite, poorly defined source
Windowing For unordered or unpredictable streams Sliding Fixed
Features: similar to database query engine Join Filter Aggr- egate
View Window
Tables & Streams stream Compacted stream Join Streaming Data Stored
Data KStream KTable
A little example…
Buying Lunch Abroad Payments Service Exchange Rates Service Buy Notification
Service Amount in ££ $$ $$ Text Message: ££ $$
ETL Option Payments Service Exchange Rates Service Buy Amount in
££ ETL ETL Join etc
Request-Response Option Payments Service Exchange Rates Service Buy Amount in
££ Join etc
Stream Processor Option Payments Service Exchange Rates Service Buy Stream
Processor join etc
Buying Lunch Abroad Payments Exchange Rates • filter(ccy<>’GBP’) • join
on ccy • Calculate payment in GBP • Send text message
Streams & Compacted Streams Payments Exchange Rates Looks like a
table (compacted stream) Looks like an infinite stream
In Process DB Topic Compacted Topic
Backed by the broker Topic Compacted Topic
Scales Out (MPP)
These tools are pretty handy for managing disconnected services
Joining Services Payments Exchange Rates Join Streams Compacted Streams
Talk your own data model Data Stream View Query
Express Lateness 9am 5pm Late trades
Keep Services Consistent
Big Global Bag of State in the Sky Problem: No
BGBSS
How to you provide the accuracy of this
In this?
The problem is failure
Duplicate messages are inevitable
Idempotence try 1 try 2 try 3 try 4
Stream processors have to solve this problem Risk of duplication
at all points in the chain
Idempotent Topics Idempotent Topic write many times (against key) emit
only once one part of a wider, and ongoing, ‘Exactly Once’ effort
Service-based approaches need this too Centralised consistency model Distributed consistency
model
Simple Approaches Just a library (over Kafka)
There is much more to stream processing it is grounded
in the world of big-data analytics
So what do we have?
Microservices push us away from global state
Big Global Bag of State in the Sky Away from
BGBSS’s
This means data is increasingly remote
Sure, you can collect it all copy copy copy copy
copy copy copy ETL ETL ETL ETL ETL ETL
But do you really want to?
Better to embrace decentralistion
We need a decentralised toolset to do this
Queued Delivery System Ordered queue
Scales Horizontally
Scales Horizontally
Scales Horizontally
Scales Horizontally
Built In Fault Tolerance
Runs Always On
For Services Too
Scales Horizontally load balance services
rebalancing on failure
Scales Horizontally with history stored in the Log
Scales Horizontally Extending to any number of services
Scales Horizontally With any data throughput
Scales Horizontally With any data throughput
Scales Horizontally With any data throughput
Scales Horizontally And tools for dealing with streams
Scales Horizontally the declarative processing of data
Scales Horizontally with in process storage
Scales Horizontally As resilient as the system that backs it
Scaling services out horizontally
Scales Horizontally to easily combining streams from real time services
or join with compacted streams
Scales Horizontally with strong ordering and repeatability guarantees
Embrace Decentralisation Keep it simple, Keep it moving