Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A perfect Storm for legacy migration
Search
ryan lemmer
October 21, 2013
Programming
0
1.6k
A perfect Storm for legacy migration
EuroClojure 2013 - Berlin
ryan lemmer
October 21, 2013
Tweet
Share
More Decks by ryan lemmer
See All by ryan lemmer
Modern Haskell: making sense of the type system
ryanlemmer
1
590
Distributed Computation: dealing with Time and Failure in the wild
ryanlemmer
0
830
Other Decks in Programming
See All in Programming
実践!App Intents対応
yuukiw00w
1
360
学習を成果に繋げるための個人開発の考え方 〜 「学習のための個人開発」のすすめ / personal project for leaning
panda_program
1
110
為你自己學 Python - 冷知識篇
eddie
1
220
ソフトウェアテスト徹底指南書の紹介
goyoki
1
110
【第4回】関東Kaggler会「Kaggleは執筆に役立つ」
mipypf
0
880
マイコンでもRustのtestがしたい その2/KernelVM Tokyo 18
tnishinaga
2
2.3k
コンテキストエンジニアリング Cursor編
kinopeee
1
710
ワープロって実は計算機で
pepepper
2
1.4k
Introduction to Git & GitHub
latte72
0
120
令和最新版手のひらコンピュータ
koba789
14
8k
Nuances on Kubernetes - RubyConf Taiwan 2025
envek
0
200
TROCCO×dbtで実現する人にもAIにもやさしいデータ基盤
nealle
0
330
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1031
460k
A Modern Web Designer's Workflow
chriscoyier
695
190k
Embracing the Ebb and Flow
colly
87
4.8k
RailsConf 2023
tenderlove
30
1.2k
Music & Morning Musume
bryan
46
6.7k
Code Review Best Practice
trishagee
70
19k
Become a Pro
speakerdeck
PRO
29
5.5k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Designing for Performance
lara
610
69k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
820
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
900
Transcript
@ryanlemmer a perfect storm for legacy migration CAPE TOWN @clj_ug_ct
legacy monolith Customer Accounting Billing Product Catalog CRM ... MySQL
Ruby on Rails
legacy Billing Run Customer Accounting Billing Product Catalog CRM ...
Bank Recon MySQL Ruby Ruby
legacy backlog bugs
legacy replacement replace this
legacy replacement replace substitute something that is broken, old or
inoperative
the “legacy problem” can’t fix bugs can’t add features not
performant
a “legacy solution” immutable It’s just too risky to do
in-situ changes
a “legacy solution” vintage the grapes or wine produced in
a particular season
The situation It’s not broken, just Immutable It’s valuable vintage
- still generating revenue We don’t need to “replace” We need to “make the Legacy Problem go away”
vintage migration vintage ?
vintage migration vintage We chose to migrate “financial” parts first
because it posed the highest risk to the business ?
vintage migration vintage statements MySQL Mongo & Redis
feeding off vintage vintage clients invoices ... ...
feeding off vintage statements clients invoices ? ... ...
feeding off vintage clients invoices transform old client write new
client write new invoice transform old invoice ... ...
... ... migration bridge statemen tage Big Run every night
+ incremental run every 10 mins Bridge is one-directional, Statements is read-only Imperative, sequential code
... ... new migration ? full text search stateme vintage
bridge
migration bridge: search clients invoices index- entity index-field index-field index-field
index-field index-field contacts ... ... ...
migration bridge clients invoices index-field index-field index-field index-field index-field write
client write invoice contacts index- entity search statements transform client transform invoice ... ... ... clients invoices ... ... }
... ... ... statements age search statements (batched) bridge search
About 10 million rows several hours to migrate sequentially
first pass solution Batched data migration BUT WHAT NEXT? it
was the easiest thing to do it is not performant not fault tolerant fragile because of data dependencies go parallel and distributed have fault tolerance go real-time served as scaffolding for the next solution
storm Apache Thrift + Nimbus Ingredients: Zookeeper Clojure (> 50%)
* suitable for polyglots
... storm - spouts clients index-field index-field index-field index-field index-field
write client index- entity transform client ... clients
... storm - spout SPOUT TUPLE
storm - data model TUPLE named list of values [“seekoei”
7] [“panda” 10] [147 {:name ‘John’ ...}] [253 {:name ‘Mary’ ...}] word frequency ID client
... storm - spout a SPOUT emits TUPLES UNBOUNDED STREAM
of TUPLES continuously over time a SPOUT is an
... storm - client spout [“client” {:id 147, ...}] CLIENT
SPOUT CLIENT TUPLE periodically emits a entity values
clojure spout (defspout client-‐spout ["entity" “values”] [conf context collector]
(let [next-‐client (next-‐legacy-‐client) tuple [“client” next-‐client]] (spout (nextTuple [] (Thread/sleep 100) (emit-‐spout! collector tuple)) (ack [id])))) creates a pulse
clojure spout (defspout client-‐spout ["entity" “values”] [conf context collector]
(let [next-‐client (next-‐legacy-‐client) tuple [“client” next-‐client]] (spout (nextTuple [] (Thread/sleep 100) (emit-‐spout! collector tuple)) (ack [id]))))
clojure spout [“client” {:id 147, ...}] CLIENT TUPLE (defspout client-‐spout
["entity" “values”] [conf context collector] (let [next-‐client (next-‐legacy-‐client) tuple [“client” next-‐client]] (spout (nextTuple [] (Thread/sleep 100) (emit-‐spout! collector tuple)) (ack [id])))) TUPLE SCHEMA
... storm - spout [“client” {:id 147, ...}] [“client” {:id
201, ...}] [“client” {:id 407, ...}] [“client” {:id 101, ...}] The client SPOUT packages input and emits TUPLES continuously over time
... storm - bolts transform client CLIENT SPOUT BOLT
storm - bolts (defbolt transform-‐client-‐bolt ["client"]
{:prepare true} [conf context collector] (bolt (execute [tuple] (let [h (.getValue tuple 1)] (emit-‐bolt! collector [(transform-‐tuple h)]) (ack! collector tuple)))))
storm - bolts [{:id 147, ...}] OUTGOING TUPLE [“client” {:id
147, ...}] INCOMING TUPLE (defbolt transform-‐client-‐bolt ["client"] {:prepare true} [conf context collector] (bolt (execute [tuple] (let [h (.getValue tuple 1)] (emit-‐bolt! collector [(transform-‐tuple h)]) (ack! collector tuple)))))
storm - topology (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 1)})) 1 2 ...
storm - topology (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 1)})) 1 2 ...
bolt tasks (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 1)})) 1 2 ...
bolt tasks (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 3)})) 1 2 ...
which task? (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 3)})) 1 2 ? ...
grouping - “shuffle” (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 3)})) 1 2 ...
grouping - “ field” 1 2 ... [“active” {:id 147,
...}] [12 {:inv-id 147, ...}] TUPLE SCHEMA ["client-‐id" “invoice-‐vals”] count invoices per client (in memory)
grouping - “ field” 1 2 ... [“active” {:id 147,
...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] TUPLE SCHEMA ["client-‐id" “invoice-‐vals”] group by field “client-id”
grouping - “ field” (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" [“client-‐id”]} transform-‐client-‐bolt :p 3)})) 1 2 ...
grouping - “ field” 1 2 ... [“active” {:id 147,
...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] 2 2 similar “client-id” vals go to the same Bolt Task
grouping - “ field” ... field compute aggregation
bridge - topology index-field write client write invoice index- fields
transform client transform invoice ... ... ... clients invoices contacts
storm - failure success! oops! a failure! ...
storm reliability Build a tree of tuples so that Storm
knows which tuples are related ack/fail Spouts + Bolts
storm guarantees Storm will re-process the entire tuple tree on
failure First attempt fails Storm retries the tuple tree until it succeeds
failure + idempotency write client transform client x2 x2 side-effects!
...
transactional topologies write client transform client x1 x1 run-once semantics
... strong ordering on data processing Storm Trident
search statements storm topologies real-time bridge age
topology design ... ... ...
topology design ... ... ... design the (directed) graph
grouping + parallelism index-field write client write invoice index- fields
transform client transform invoice :shuffle :shuffle :shuffle :shuffle :shuffle :shuffle :p 1 :p 1 :p 1 :p 10 :p 3 :p 3 ... ... ... tune the runtime by annotating the graph edges
topology - tuple schema [“client”] [“entity” “values”] [“invoice”] [“entity” “values”]
[“entity” “values”] [“client”] [“invoice”] [“key_val_pairs”] [“key_val”] We are actually processing streams of tuples continuously
ntage topology design clients context sales context billing context (queue)
(queue) .. .. .. .. .. ..
storm “real-time, distributed, fault-tolerant, computation system” stream processing realtime analytics
continuous computation distributed RPC ...
reflections
search statements age storm topologies vintage is first- class
search statements age storm topologies transform data
search statements age storm topologies not code refactor if you
can! (but only if it’s worth the effort)
search statements age storm topologies not a picnic because we’re
still replacing code and now we’ve added replication
but worth it Big Replace Smaller replacements In-situ changes Augment:
new alongside old Replace Evolve new Kill Starve (until irrelevant)
EUROCLOJURE Berlin 2013 thanks