Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SWIM: Scalable Weakly Consistent Infection Styl...
Search
Paul Hinze
July 22, 2015
Technology
2
300
SWIM: Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Papers We Love, Chicago
July 22, 2015
Paul Hinze
July 22, 2015
Tweet
Share
More Decks by Paul Hinze
See All by Paul Hinze
Getting Good at System Failure Analysis
phinze
0
420
Applying Graph Theory to Infrastructure As Code
phinze
6
1.9k
Infrastructure as Code with Terraform and Friends
phinze
2
270
Smoke & Mirrors: The Primitives of High Availability
phinze
1
480
Git: Everybody's Favorite MMO
phinze
0
120
Shut Up and Pipe! Unix-style Object Collaboration in Rack and Vagrant
phinze
0
120
Freighthop: Vagrant on Rails
phinze
0
270
Puppet Modules Are Our Friends
phinze
0
77
Who Needs Clouds?: HA in Your Datacenter
phinze
1
490
Other Decks in Technology
See All in Technology
サーバレスでモバイルアプリ開発! NTTコム「ビジネスdアプリ」のアーキテクチャ / The architecture of business d app
nttcom
12
240
開発者の定量・定性データを組み合わせて開発者体験を把握するための取り組み
ham0215
1
130
ロリポップ! for Gamersを支えるインフラ/lolipop for gamers infrastructure
takumakume
0
130
LINEヤフーのフロントエンド組織・体制の紹介
lycorp_recruit_jp
1
1.2k
PDF Viewer作成の今までとこれから
hunachi
0
470
GC24 Recap: Interface Internals
task4233
0
140
社内の学びの場・コミュニティ形成とエンジニア同士のリレーションシップ構築/devreljapan2024
nishiuma
3
290
Agile in Automotive Industry, puzzles and lights.
hiranabe
3
1.4k
Oracle Autonomous Database:サービス概要のご紹介
oracle4engineer
PRO
1
7.1k
開発生産性を始める前に開発チームができること / optim-improve-development-productivity.pdf
optim
0
110
『GRANBLUE FANTASY: Relink』最高の「没入感」を実現するカットシーン制作手法とそれを支える技術
cygames
1
140
Developer Experienceを向上させる基盤づくりの取り組み事例集
coconala_engineer
0
150
Featured
See All Featured
Speed Design
sergeychernyshev
22
430
Why Our Code Smells
bkeepers
PRO
334
56k
Faster Mobile Websites
deanohume
304
30k
Fantastic passwords and where to find them - at NoRuKo
philnash
48
2.8k
Build The Right Thing And Hit Your Dates
maggiecrowley
30
2.3k
Thoughts on Productivity
jonyablonski
66
4.2k
The Language of Interfaces
destraynor
153
23k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
36
2.1k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
23
1.7k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Scaling GitHub
holman
458
140k
BBQ
matthewcrist
83
9.2k
Transcript
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Paul Hinze phinze
Paul Hinze phinze death stare
Armon Dadgar armon creator of Serf and Consul
ma
None
None
Process Group Membership Protocol Who is alive
None
Process Group Membership Protocol
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Scalable The SWIM effort is motivated by the unscalability of
traditional heartbeating protocols.
Heartbeating A B C A A B B Failure Detection
+ Membership
Heartbeating
Evaluating Protocols Completeness Speed Accuracy Overhead
Evaluating Protocols Completeness Speed Accuracy Overhead heartbeating Yes Limit *
Interval High Nodes2 !
Key Insight Failure Detection State Updates. Solve separately from
Failure Detection ping! ack! A B C D {B,C,D}
Failure Detection ping! ack! A B C D {B,C,D}
Indirect Ping ping(C)! ack! B C D ping(C)! ping ack
fail {B,C,D}
Indirect Ping ping(C)! B C D ping(C)! fail ...C is
dead! fail
Key Insight Failure Detection State Updates. Solve separately from
Key Insight State Updates Failure Detection Piggyback onto messages.
State Updates ping! (B is dead) ack! (D just joined)
A B C D {B,C} {D}
Infection Style A B C D {B,C} {A,B,D} {A,B} weakly
consistent
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, eventually 1
* Interval High-ish O(N)
Improvements Time Bounded Completeness Increased Accuracy
Completeness A B {B, C, D, ..., N} N
Completeness A B {B, C, D, ..., N} N 1.
Shuffle List 2. Iterate
Completeness Fixed Time
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High-ish O(N)
Accuracy B C D ...C is MAYBE dead! A
Accuracy B C D I heard C might be dead.
A
Accuracy B C D I'm not dead! A
Accuracy
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
None
Limitations Update Latency Problem Solution Separate Gossip Timer
Limitations Cannot Handle Network Partitions Problem Solution Track and Retry
Recently Dead Nodes
Limitations No Concept of Graceful Leave (vs Failure) Problem Solution
Broadcast and Tracking of "Intents"
Limitations New Nodes Take Too Long Materialize Initial State Problem
Solution Anti-entropy TCP State Syncs
Limitations No built-in facility for user data Problem Solution Implement
user payloads (ordering via lamport clocks)
Limitations No peer metadata, only IP Addresses Problem Solution Inject
versioned peer metadata into state messages
Limitations No encryption Problem Solution Implement AES-GCM with key rotation
56 nodes 2K nodes Performance
Implementations Memberlist https://github.com/hashicorp/memberlist Serf lib https://github.com/hashicorp/serf Serf http://www.serfdom.io Consul http://www.consul.io
Implementations events, queries, and scripts serf CLI serf lib memberlist
Implementations service discovery, K/V, health checks consul serf lib memberlist
Implementations service discovery, K/V, health checks
Thanks Cloud icons by Julien Deveaux from the Noun Project