Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SWIM: Scalable Weakly Consistent Infection Styl...
Search
Paul Hinze
July 22, 2015
Technology
400
2
Share
SWIM: Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Papers We Love, Chicago
July 22, 2015
Paul Hinze
July 22, 2015
More Decks by Paul Hinze
See All by Paul Hinze
Getting Good at System Failure Analysis
phinze
0
480
Applying Graph Theory to Infrastructure As Code
phinze
6
2.2k
Infrastructure as Code with Terraform and Friends
phinze
2
330
Smoke & Mirrors: The Primitives of High Availability
phinze
1
760
Git: Everybody's Favorite MMO
phinze
0
220
Shut Up and Pipe! Unix-style Object Collaboration in Rack and Vagrant
phinze
0
190
Freighthop: Vagrant on Rails
phinze
0
340
Puppet Modules Are Our Friends
phinze
0
140
Who Needs Clouds?: HA in Your Datacenter
phinze
1
550
Other Decks in Technology
See All in Technology
「嘘をつくテスト」の失敗例から学ぶ 良いテストコード #frontend_phpcon_do
asumikam
0
160
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
AI Adaptable なテストを整える工夫 / Ways to Make Your Tests AI-Adaptable
bitkey
PRO
2
210
Databricks 月刊サービスアップデート 2026年05月号
tyosi1212
0
200
oracle-to-databricks-migration-with-llm-and-dbt
casek
1
430
Claude Codeを組織で使いこなす— サーバサイドAIエージェント運用の実践知
techtekt
PRO
0
190
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.8k
Claude code Orchestra
ozakiomumkj
3
920
Gradle×GitHub_ActionsでCI時間を約50%短縮 ジョブ分割の設計と落とし穴 / Cutting CI Time by ~50% with Gradle and GitHub Actions: Job-Splitting Design and Pitfalls
takatty
0
620
プラットフォームエンジニア ワークショップ/ platform-workshop
databricksjapan
0
230
価格.comをAI駆動で全面刷新する ー 30年分の技術的負債を返し、次の30年の土台をつくる ー / AI Engineering Summit Tokyo 2026
tkyowa
38
41k
Databricks における 生成AIガバナンスの実践
taka_aki
1
280
Featured
See All Featured
Six Lessons from altMBA
skipperchong
29
4.3k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
61
44k
WENDY [Excerpt]
tessaabrams
11
38k
GitHub's CSS Performance
jonrohan
1033
470k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.7k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
480
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
280
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.5k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Accessibility Awareness
sabderemane
1
130
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
Transcript
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Paul Hinze phinze
Paul Hinze phinze death stare
Armon Dadgar armon creator of Serf and Consul
ma
None
None
Process Group Membership Protocol Who is alive
None
Process Group Membership Protocol
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Scalable The SWIM effort is motivated by the unscalability of
traditional heartbeating protocols.
Heartbeating A B C A A B B Failure Detection
+ Membership
Heartbeating
Evaluating Protocols Completeness Speed Accuracy Overhead
Evaluating Protocols Completeness Speed Accuracy Overhead heartbeating Yes Limit *
Interval High Nodes2 !
Key Insight Failure Detection State Updates. Solve separately from
Failure Detection ping! ack! A B C D {B,C,D}
Failure Detection ping! ack! A B C D {B,C,D}
Indirect Ping ping(C)! ack! B C D ping(C)! ping ack
fail {B,C,D}
Indirect Ping ping(C)! B C D ping(C)! fail ...C is
dead! fail
Key Insight Failure Detection State Updates. Solve separately from
Key Insight State Updates Failure Detection Piggyback onto messages.
State Updates ping! (B is dead) ack! (D just joined)
A B C D {B,C} {D}
Infection Style A B C D {B,C} {A,B,D} {A,B} weakly
consistent
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, eventually 1
* Interval High-ish O(N)
Improvements Time Bounded Completeness Increased Accuracy
Completeness A B {B, C, D, ..., N} N
Completeness A B {B, C, D, ..., N} N 1.
Shuffle List 2. Iterate
Completeness Fixed Time
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High-ish O(N)
Accuracy B C D ...C is MAYBE dead! A
Accuracy B C D I heard C might be dead.
A
Accuracy B C D I'm not dead! A
Accuracy
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
None
Limitations Update Latency Problem Solution Separate Gossip Timer
Limitations Cannot Handle Network Partitions Problem Solution Track and Retry
Recently Dead Nodes
Limitations No Concept of Graceful Leave (vs Failure) Problem Solution
Broadcast and Tracking of "Intents"
Limitations New Nodes Take Too Long Materialize Initial State Problem
Solution Anti-entropy TCP State Syncs
Limitations No built-in facility for user data Problem Solution Implement
user payloads (ordering via lamport clocks)
Limitations No peer metadata, only IP Addresses Problem Solution Inject
versioned peer metadata into state messages
Limitations No encryption Problem Solution Implement AES-GCM with key rotation
56 nodes 2K nodes Performance
Implementations Memberlist https://github.com/hashicorp/memberlist Serf lib https://github.com/hashicorp/serf Serf http://www.serfdom.io Consul http://www.consul.io
Implementations events, queries, and scripts serf CLI serf lib memberlist
Implementations service discovery, K/V, health checks consul serf lib memberlist
Implementations service discovery, K/V, health checks
Thanks Cloud icons by Julien Deveaux from the Noun Project