Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting the Word Out: Membership, Dissemination...

Getting the Word Out: Membership, Dissemination, and Population Protocols

We are building an instrumentation platform that runs across dozens of datacenters to provide operational visibility for internal systems and applications. This platform must remain up as much as possible and allow support and operations staff to understand and diagnose problems quickly. They must be able to ask questions like "what machines and applications are publishing metrics?", "what systems appear to be offline?", "what order did these errors occur in?", all without consulting every datacenter. Furthermore, they must be able to change configuration quickly, with confidence that every affected system will receive and act upon it.

To help with these problems, we are implementing several recently developed protocols for cluster membership, epidemic broadcast, and monotonic time. Respectively, these protocols allow us to know what nodes are peers, to disseminate configuration and status information, and to agree on roughly relative orders of events. Best of all, they are all synchronization-free, meaning we can achieve our goals while remaining highly available. In this talk, we'll discuss the protocols we chose, challenges to implementing them, and some preliminary results from deploying the protocols across our infrastructure.

Sean Cribbs

June 14, 2016
Tweet

More Decks by Sean Cribbs

Other Decks in Technology

Transcript

  1. M E M B E R S H I P,

    D I S S E M I N AT I O N & P O P U L AT I O N P R OTO CO LS G E T T I N G T H E W O R D O U T: SEAN CRIBBS SENIOR PRINCIPAL ENGINEER All photos are my own unless attributed.
  2. W H Y B U I L D P E

    E R -TO - P E E R SYST E M S ?
  3. W H Y N OT P E E R -TO

    - P E E R SYST E M S ?
  4. W H Y N OT P E E R -TO

    - P E E R SYST E M S ? N o w y o u ’v e g o t N p ro b l e m s 

  5. W H Y N OT P E E R -TO

    - P E E R SYST E M S ? N o w y o u ’v e g o t N p ro b l e m s 
 N = 8 p ro b a b l y
  6. W H AT A R E W E B U

    I L D I N G ?
  7. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S Emoji provided free by Emoji One
  8. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T Emoji provided free by Emoji One
  9. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T Emoji provided free by Emoji One
  10. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  11. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  12. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  13. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  14. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  15. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  16. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G ” E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  17. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G ” “ P L AT F O R M ” E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  18. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G ” “ P L AT F O R M ” ➡ Work distribution E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  19. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G ” “ P L AT F O R M ” ➡ Work distribution ➡ Fault-tolerance E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  20. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G ” “ P L AT F O R M ” ➡ Work distribution ➡ Fault-tolerance ➡ Locality E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  21. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T SYST E M H E A LT H SYST E M H E A LT H A P P L I CAT I O N M E T R I C S A P P L I CAT I O N M E T R I C S AG E N T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G ” “ P L AT F O R M ” ➡ Work distribution ➡ Fault-tolerance ➡ Locality E XT E R N A L S E R V I C E S E XT E R N A L S E R V I C E S ✓ Peer to Peer! Emoji provided free by Emoji One
  22. E XT E R N A L S E R

    V I C E S AG E N T A R G U S O P E RAT I O N A L V I S I B I L I T Y P R OJ EC T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G” “ P L AT F O R M ” E XT E R N A L S E R V I C E S Emoji provided free by Emoji One
  23. E XT E R N A L S E R

    V I C E S AG E N T A R G U S O P E RAT I O N A L V I S I B I L I T Y P R OJ EC T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G” “ P L AT F O R M ” E XT E R N A L S E R V I C E S Emoji provided free by Emoji One How do cluster nodes find each other?
  24. E XT E R N A L S E R

    V I C E S AG E N T A R G U S O P E RAT I O N A L V I S I B I L I T Y P R OJ EC T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G” “ P L AT F O R M ” E XT E R N A L S E R V I C E S Emoji provided free by Emoji One How do cluster nodes find each other? Distribute code and configuration?
  25. E XT E R N A L S E R

    V I C E S AG E N T A R G U S O P E RAT I O N A L V I S I B I L I T Y P R OJ EC T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G” “ P L AT F O R M ” E XT E R N A L S E R V I C E S Emoji provided free by Emoji One How do cluster nodes find each other? Distribute code and configuration? Know what happened when?
  26. E XT E R N A L S E R

    V I C E S AG E N T A R G U S O P E RAT I O N A L V I S I B I L I T Y P R OJ EC T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G” “ P L AT F O R M ” E XT E R N A L S E R V I C E S Emoji provided free by Emoji One Where do agents send data? How do cluster nodes find each other? Distribute code and configuration? Know what happened when?
  27. E XT E R N A L S E R

    V I C E S AG E N T A R G U S O P E RAT I O N A L V I S I B I L I T Y P R OJ EC T AG E N T “ M U LT I -T E N A N T ” “ M U LT I - R E G I O N ” “ H I G H LY- AVA I L A B L E ” “ R E A L-T I M E ” “ ST R E A M I N G” “ P L AT F O R M ” E XT E R N A L S E R V I C E S Emoji provided free by Emoji One Where do agents send data? How to get fault-tolerance without spam? How do cluster nodes find each other? Distribute code and configuration? Know what happened when?
  28. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T
  29. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T ➡ Cluster membership and discovery
  30. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T ➡ Cluster membership and discovery ➡ Code and configuration dissemination
  31. A R G U S O P E RAT I

    O N A L V I S I B I L I T Y P R OJ EC T ➡ Cluster membership and discovery ➡ Code and configuration dissemination ➡ Relative and convergent time
  32. M E M B E R S H I P

    P R OTO C O L S
  33. W H Y N OT Z O O K E

    E P E R / C O N S U L / E TC D ? J U ST R U B S O M E C O N S E N S U S O N I T
  34. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S
  35. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S ➡ Connectedness
  36. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S ➡ Connectedness ➡ Balance
  37. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S ➡ Connectedness ➡ Balance ➡ Short path-length
  38. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S ➡ Connectedness ➡ Balance ➡ Short path-length ➡ Low clustering
  39. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S ➡ Connectedness ➡ Balance ➡ Short path-length ➡ Low clustering ➡ Scalability
  40. M E M B E R S H I P

    : D E S I RA B L E P R O P E RT I E S ➡ Connectedness ➡ Balance ➡ Short path-length ➡ Low clustering ➡ Scalability ➡ Accuracy
  41. M E M B E R S H I P

    : “ V I E W ” F L AV O R S Full Partial
  42. S W I M - 2 0 0 2 Emoji

    provided free by Emoji One
  43. S W I M - 2 0 0 2 Emoji

    provided free by Emoji One Heartbeat protocols
  44. S W I M - 2 0 0 2 ๏

    Quadratic load Emoji provided free by Emoji One Heartbeat protocols
  45. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection Emoji provided free by Emoji One Heartbeat protocols
  46. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection ๏ Response times Emoji provided free by Emoji One Heartbeat protocols
  47. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection ๏ Response times ๏ False positives Emoji provided free by Emoji One Heartbeat protocols
  48. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection ๏ Response times ๏ False positives Emoji provided free by Emoji One Heartbeat protocols SWIM solutions
  49. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection ๏ Response times ๏ False positives ➡ Separate membership and failure detection Emoji provided free by Emoji One Heartbeat protocols SWIM solutions
  50. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection ๏ Response times ๏ False positives ➡ Separate membership and failure detection ➡ Randomized probing Emoji provided free by Emoji One Heartbeat protocols SWIM solutions
  51. S W I M - 2 0 0 2 ๏

    Quadratic load ๏ Failure detection ๏ Response times ๏ False positives ➡ Separate membership and failure detection ➡ Randomized probing ➡ Piggyback membership on probes Emoji provided free by Emoji One Heartbeat protocols SWIM solutions
  52. S CA M P - 2 0 0 3 ๏

    Full views limit scalability
  53. S CA M P - 2 0 0 3 ๏

    Full views limit scalability ➡ Flexible partial-view size, asymmetric
  54. S CA M P - 2 0 0 3 ๏

    Full views limit scalability ➡ Flexible partial-view size, asymmetric ➡ Reactive view management
  55. S CA M P - 2 0 0 3 ๏

    Full views limit scalability ➡ Flexible partial-view size, asymmetric ➡ Reactive view management ➡ Join (“subscribe”) via random walk
  56. S CA M P - 2 0 0 3 ๏

    Full views limit scalability ➡ Flexible partial-view size, asymmetric ➡ Reactive view management ➡ Join (“subscribe”) via random walk ➡ Automatic balancing via indirection and leases
  57. CYC LO N - 2 0 0 5 ๏ Random

    shuffling doesn’t create good balance
  58. CYC LO N - 2 0 0 5 ๏ Random

    shuffling doesn’t create good balance ➡ Fixed partial-view size, symmetric
  59. CYC LO N - 2 0 0 5 ๏ Random

    shuffling doesn’t create good balance ➡ Fixed partial-view size, symmetric ➡ Cyclic view management
  60. CYC LO N - 2 0 0 5 ๏ Random

    shuffling doesn’t create good balance ➡ Fixed partial-view size, symmetric ➡ Cyclic view management ➡ Join via random walk
  61. P R O B L E M S W I

    T H S CA M P & CYC LO N • No failure detectors • SCAMP: asymmetric views ⟹ disconnection • SCAMP: unbounded view size ⟹ imbalance
  62. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability
  63. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability ๏ High failure rates decrease quality
  64. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability ๏ High failure rates decrease quality
  65. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability ๏ High failure rates decrease quality ➡ TCP for transport and failure detector
  66. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability ๏ High failure rates decrease quality ➡ TCP for transport and failure detector ➡ Small reactive view (“active”)
  67. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability ๏ High failure rates decrease quality ➡ TCP for transport and failure detector ➡ Small reactive view (“active”) ➡ Larger cyclic view (“passive”)
  68. H Y PA R V I E W - 2

    0 0 7 ๏ Fanout is related to reliability ๏ High failure rates decrease quality ➡ TCP for transport and failure detector ➡ Small reactive view (“active”) ➡ Larger cyclic view (“passive”) ➡ Join and shuffle via random walk
  69. H Y PA R V I E W - 2

    0 0 7 A B C D Passive view 
 maintenance
  70. W E C H O S E H Y PA

    R V I E W • Only active view maintenance • Passive view maintains full membership (unbounded) • Later: switch to complete passive maintenance
  71. D I S S E M I N AT I

    O N P R OTO C O L S
  72. D I S S E M I N AT I

    O N : D E S I RA B L E P R O P E RT I E S
  73. D I S S E M I N AT I

    O N : D E S I RA B L E P R O P E RT I E S ➡ Reliability
  74. D I S S E M I N AT I

    O N : D E S I RA B L E P R O P E RT I E S ➡ Reliability ➡ Scalability
  75. D I S S E M I N AT I

    O N : D E S I RA B L E P R O P E RT I E S ➡ Reliability ➡ Scalability ➡ Efficiency
  76. E P I D E M I C B R

    OA D CAST ( G O S S I P ) !
  77. E P I D E M I C B R

    OA D CAST ( G O S S I P ) ➡ Send to random peers !
  78. E P I D E M I C B R

    OA D CAST ( G O S S I P ) ➡ Send to random peers ➡ Messages rebroadcast by recipients !
  79. E P I D E M I C B R

    OA D CAST ( G O S S I P ) ➡ Send to random peers ➡ Messages rebroadcast by recipients !
  80. E P I D E M I C B R

    OA D CAST ( G O S S I P ) ➡ Send to random peers ➡ Messages rebroadcast by recipients ๏ High redundancy !
  81. E P I D E M I C B R

    OA D CAST ( G O S S I P ) ➡ Send to random peers ➡ Messages rebroadcast by recipients ๏ High redundancy ๏ Low scalability !
  82. I N C R E AS E D E F

    F I C I E N CY W I T H O U T R E D U C I N G D E L I V E RY G UA R A N T E E S , W E N E E D
  83. P L U M T R E E - 2

    0 0 9 CO N ST R U C T I O N ! A B
  84. P L U M T R E E - 2

    0 0 9 CO N ST R U C T I O N • All nodes start with full “eager” set ! A B
  85. P L U M T R E E - 2

    0 0 9 CO N ST R U C T I O N • All nodes start with full “eager” set • Broadcast triggers eager-push ! A B
  86. P L U M T R E E - 2

    0 0 9 CO N ST R U C T I O N • All nodes start with full “eager” set • Broadcast triggers eager-push • Duplicate messages cause “pruning” (move to “lazy”) ! A B
  87. P L U M T R E E - 2

    0 0 9 CO N ST R U C T I O N • All nodes start with full “eager” set • Broadcast triggers eager-push • Duplicate messages cause “pruning” (move to “lazy”) ! A B
  88. P L U M T R E E - 2

    0 0 9 CO N ST R U C T I O N • All nodes start with full “eager” set • Broadcast triggers eager-push • Duplicate messages cause “pruning” (move to “lazy”) • Regular broadcasts proceed with new “eager” sets ! A B
  89. P L U M T R E E - 2

    0 0 9 R E PA I R ! A B
  90. P L U M T R E E - 2

    0 0 9 R E PA I R • Lazy-push sends “I Have” messages ! A B
  91. P L U M T R E E - 2

    0 0 9 R E PA I R • Lazy-push sends “I Have” messages • Timeout triggers “grafting” (move to “eager”) ! A B
  92. P L U M T R E E - 2

    0 0 9 R E PA I R • Lazy-push sends “I Have” messages • Timeout triggers “grafting” (move to “eager”) ! A B
  93. P L U M T R E E - 2

    0 0 9 R E PA I R • Lazy-push sends “I Have” messages • Timeout triggers “grafting” (move to “eager”) • Lazy-push batched to reduce overhead ! A B
  94. W E C H O S E P L U

    M T R E E • Good tradeoff between reliability and redundancy
  95. W E C H O S E P L U

    M T R E E • Good tradeoff between reliability and redundancy • Optimizes for lowest-latency paths
  96. W E C H O S E P L U

    M T R E E • Good tradeoff between reliability and redundancy • Optimizes for lowest-latency paths • Existing open-source implementations
  97. W E C H O S E P L U

    M T R E E • Good tradeoff between reliability and redundancy • Optimizes for lowest-latency paths • Existing open-source implementations • Excellent fit with HyParView
  98. P O P U L AT I O N P

    R OTO C O L S
  99. R A N D O M I Z E D

    I N T E R AC T I O N S P O P U L AT I O N P R OTO C O L S U S E
  100. D I ST R I B U T E D

    M O N OTO N I C C LO C K S J O N M O O R E Vidcap from StrangeLoop 2015: https://youtu.be/YqNGbvFHoKM
  101. D M C P R O B L E M

    S ๏ “Wacky clock mode” ๏ Hierarchy imbalances load ๏ Long-lived partitions ๏ No convergence proof
  102. A P P LY I N G D M C

    • Use existing dissemination with DMC • Transmit clocks along with other messages • Use monotonic clocks as a drift-detection mechanism
  103. T H A N K YO U ! @ S

    E A N C R I B B S