Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing and Evaluating a Distributed Computin...

Designing and Evaluating a Distributed Computing Language Runtime

Erlang User Conference 2016

Christopher Meiklejohn

September 09, 2016
Tweet

More Decks by Christopher Meiklejohn

Other Decks in Research

Transcript

  1. Synchronization • To enforce an order
 Makes programming easier •

    Eliminate accidental nondeterminism
 Prevent race conditions 6
  2. Synchronization • To enforce an order
 Makes programming easier •

    Eliminate accidental nondeterminism
 Prevent race conditions • Techniques
 Locks, mutexes, semaphores, monitors, etc. 6
  3. Difficult Cases • “Internet of Things”, 
 Low power, limited

    memory and connectivity • Mobile Gaming
 Offline operation with replicated, shared state 7
  4. Weak Synchronization • Can we achieve anything without synchronization?
 Not

    really. • Strong Eventual Consistency (SEC)
 “Replicas that deliver the same updates have equivalent state” 8
  5. Weak Synchronization • Can we achieve anything without synchronization?
 Not

    really. • Strong Eventual Consistency (SEC)
 “Replicas that deliver the same updates have equivalent state” • Primary requirement
 Eventual replica-to-replica communication 8
  6. Weak Synchronization • Can we achieve anything without synchronization?
 Not

    really. • Strong Eventual Consistency (SEC)
 “Replicas that deliver the same updates have equivalent state” • Primary requirement
 Eventual replica-to-replica communication • Order insensitive! (Commutativity) 8
  7. Weak Synchronization • Can we achieve anything without synchronization?
 Not

    really. • Strong Eventual Consistency (SEC)
 “Replicas that deliver the same updates have equivalent state” • Primary requirement
 Eventual replica-to-replica communication • Order insensitive! (Commutativity) • Duplicate insensitive! (Idempotent) 8
  8. Programming SEC 1. Eliminate accidental nondeterminism
 (ex. deterministic, modeling non-monotonic

    operations monotonically) 2. Retain the properties of functional programming
 (ex. confluence, referential transparency over composition) 14
  9. Programming SEC 1. Eliminate accidental nondeterminism
 (ex. deterministic, modeling non-monotonic

    operations monotonically) 2. Retain the properties of functional programming
 (ex. confluence, referential transparency over composition) 3. Distributed, and fault-tolerant runtime
 (ex. replication, membership, dissemination) 14
  10. Programming SEC 1. Eliminate accidental nondeterminism
 (ex. deterministic, modeling non-monotonic

    operations monotonically) 2. Retain the properties of functional programming
 (ex. confluence, referential transparency over composition) 3. Distributed, and fault-tolerant runtime
 (ex. replication, membership, dissemination) 15
  11. Conflict-Free 
 Replicated Data Types • Many types exist with

    different properties
 Sets, counters, registers, flags, maps, graphs 17
  12. Conflict-Free 
 Replicated Data Types • Many types exist with

    different properties
 Sets, counters, registers, flags, maps, graphs • Strong Eventual Consistency
 Instances satisfy SEC property per- object 17
  13. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1) {} (1, {c}, {c}) remove(1)
  14. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1) {} (1, {c}, {c}) remove(1) {1} {1} {1} (1, {a, c}, {c}) (1, {a, c}, {c}) (1, {a, c}, {c})
  15. Programming SEC 1. Eliminate accidental nondeterminism
 (ex. deterministic, modeling non-monotonic

    operations monotonically) 2. Retain the properties of functional programming
 (ex. confluence, referential transparency over composition) 3. Distributed, and fault-tolerant runtime
 (ex. replication, membership, dissemination) 23
  16. Lattice Processing (Lasp) • Distributed dataflow
 Declarative, functional programming model

    • Convergent data structures
 Primary data abstraction is the CRDT 25
  17. Lattice Processing (Lasp) • Distributed dataflow
 Declarative, functional programming model

    • Convergent data structures
 Primary data abstraction is the CRDT • Enables composition
 Provides functional composition of CRDTs that preserves the SEC property 25
  18. 26 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  19. 27 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  20. 28 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  21. 29 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  22. 30 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  23. Programming SEC 1. Eliminate accidental nondeterminism
 (ex. deterministic, modeling non-monotonic

    operations monotonically) 2. Retain the properties of functional programming
 (ex. confluence, referential transparency over composition) 3. Distributed, and fault-tolerant runtime
 (ex. replication, membership, dissemination) 31
  24. Selective Hearing • Epidemic broadcast based runtime system
 Provide a

    runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution 33
  25. Selective Hearing • Epidemic broadcast based runtime system
 Provide a

    runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution • Well-matched to Lattice Processing (Lasp) 33
  26. Selective Hearing • Epidemic broadcast based runtime system
 Provide a

    runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution • Well-matched to Lattice Processing (Lasp) • Epidemic broadcast mechanisms provide weak ordering but are resilient and efficient 33
  27. Selective Hearing • Epidemic broadcast based runtime system
 Provide a

    runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution • Well-matched to Lattice Processing (Lasp) • Epidemic broadcast mechanisms provide weak ordering but are resilient and efficient • Lasp’s programming model is tolerant to message re- ordering, disconnections, and node failures 33
  28. Selective Hearing • Epidemic broadcast based runtime system
 Provide a

    runtime system that can scale to large numbers of nodes, that is resilient to failures and provides efficient execution • Well-matched to Lattice Processing (Lasp) • Epidemic broadcast mechanisms provide weak ordering but are resilient and efficient • Lasp’s programming model is tolerant to message re- ordering, disconnections, and node failures • “Selective Receive”
 Nodes selectively receive and process messages based on interest. 33
  29. Layered Approach • Membership
 Configurable membership protocol which can operate

    in a client-server or peer-to-peer mode • Broadcast (via Gossip, Tree, etc.)
 Efficient dissemination of both program state and application state via gossip, broadcast tree, or hybrid mode 34
  30. Layered Approach • Membership
 Configurable membership protocol which can operate

    in a client-server or peer-to-peer mode • Broadcast (via Gossip, Tree, etc.)
 Efficient dissemination of both program state and application state via gossip, broadcast tree, or hybrid mode • Auto-discovery
 Integration with Mesos, auto-discovery of Lasp nodes for ease of configurability 34
  31. Programming SEC 1. Eliminate accidental nondeterminism
 (ex. deterministic, modeling non-monotonic

    operations monotonically) 2. Retain the properties of functional programming
 (ex. confluence, referential transparency over composition) 3. Distributed, and fault-tolerant runtime
 (ex. replication, membership, dissemination) 41
  32. Advertisement Counter • Mobile game platform selling advertisement space
 Advertisements

    are paid according to a minimum number of impressions • Clients will go offline
 Clients have limited connectivity and the system still needs to make progress while clients are offline 43
  33. Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot

    Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client 44
  34. Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot

    Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Riot Ads Rovio Ads Product Read 50,000 Remove Increment Union 45 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  35. Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot

    Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Rovio Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 1 Client 46 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  36. Ads ovio Ad ounter 1 ovio Ad ounter 2 Riot

    Ad ounter 1 Riot Ad ounter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Rovio Ad Counter 1 Ro C Rovio Ad Counter 1 Ro C Rovio Ad Counter 1 Ro C Rovio Ad Counter 1 Ro C Client Side, Sing 47 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  37. Ads Contracts Ads Contracts Ads With Contracts Riot Ads Rovio

    Ads Filter Product move Read Union Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client 48 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  38. Ads Contracts Ads Contracts Ads With Contracts Filter Product Read

    Union Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client 49 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  39. Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot

    Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client 50 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  40. Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot

    Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Riot Ads Rovio Ads Fil Product Read 50,000 Remove Increment Union 51 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  41. Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot

    Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client 52 Ads Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Riot Ad Counter 2 Contracts Ads Contracts Ads With Contracts Riot Ads Rovio Ads Filter Product Read 50,000 Remove Increment Read Union Lasp Operation User-Maintained CRDT Lasp-Maintained CRDT Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Rovio Ad Counter 1 Rovio Ad Counter 2 Riot Ad Counter 1 Client Side, Single Copy at Client
  42. Background Distributed Erlang • Transparent distribution
 Built-in, provided by Erlang/BEAM,

    cross-node message passing. • Known scalability limitations
 Analyzed in academic in various publications. 54
  43. Background Distributed Erlang • Transparent distribution
 Built-in, provided by Erlang/BEAM,

    cross-node message passing. • Known scalability limitations
 Analyzed in academic in various publications. • Single connection
 Head of line blocking. 54
  44. Background Distributed Erlang • Transparent distribution
 Built-in, provided by Erlang/BEAM,

    cross-node message passing. • Known scalability limitations
 Analyzed in academic in various publications. • Single connection
 Head of line blocking. • Full membership
 All-to-all failure detection with heartbeats and timeouts. 54
  45. Background Erlang Port Mapper Daemon • Operates on a known

    port
 Similar to Solaris sunrpc style portmap: known port for mapping to dynamic port-based services. 55
  46. Background Erlang Port Mapper Daemon • Operates on a known

    port
 Similar to Solaris sunrpc style portmap: known port for mapping to dynamic port-based services. • Bridged networking
 Problematic for cluster in bridged networking with dynamic port allocation. 55
  47. Experiment Design • Single application
 Advertisement counter example from Rovio

    Entertainment. • Runtime configuration
 Application controlled through runtime environment variables. 56
  48. Experiment Design • Single application
 Advertisement counter example from Rovio

    Entertainment. • Runtime configuration
 Application controlled through runtime environment variables. • Membership
 Full membership with Distributed Erlang via EPMD. 56
  49. Experiment Design • Single application
 Advertisement counter example from Rovio

    Entertainment. • Runtime configuration
 Application controlled through runtime environment variables. • Membership
 Full membership with Distributed Erlang via EPMD. • Dissemination
 State-based object dissemination through anti-entropy protocol (fanout-based, PARC-style.) 56
  50. Experiment Orchestration • Docker and Mesos with Marathon
 Used for

    deployment of both EPMD and Lasp application. 57
  51. Experiment Orchestration • Docker and Mesos with Marathon
 Used for

    deployment of both EPMD and Lasp application. • Single EPMD instance per slave
 Controlled through the use of host networking and HOSTNAME: UNIQUE constraints in Mesos. 57
  52. Experiment Orchestration • Docker and Mesos with Marathon
 Used for

    deployment of both EPMD and Lasp application. • Single EPMD instance per slave
 Controlled through the use of host networking and HOSTNAME: UNIQUE constraints in Mesos. • Lasp
 Local execution using host networking: connects to local EPMD. 57
  53. Experiment Orchestration • Docker and Mesos with Marathon
 Used for

    deployment of both EPMD and Lasp application. • Single EPMD instance per slave
 Controlled through the use of host networking and HOSTNAME: UNIQUE constraints in Mesos. • Lasp
 Local execution using host networking: connects to local EPMD. • Service Discovery
 Service discovery facilitated through clustering EPMD instances through Sprinter. 57
  54. Ideal Experiment • Local Deployment
 High thread concurrency when operating

    with lower node count. • Cloud Deployment
 Low thread concurrency when operating with a higher node count. 58
  55. Initial Evaluation • Moved to DC/OS exclusively
 Environments too different:

    too much work needed to be adapted for things to work correctly. 60
  56. Initial Evaluation • Moved to DC/OS exclusively
 Environments too different:

    too much work needed to be adapted for things to work correctly. • Single orchestration task
 Dispatched events, controlled when to start and stop the evaluation and performed log aggregation. 60
  57. Initial Evaluation • Moved to DC/OS exclusively
 Environments too different:

    too much work needed to be adapted for things to work correctly. • Single orchestration task
 Dispatched events, controlled when to start and stop the evaluation and performed log aggregation. • Bottleneck
 Events immediately dispatched: would require blocking for processing acknowledgment. 60
  58. Initial Evaluation • Moved to DC/OS exclusively
 Environments too different:

    too much work needed to be adapted for things to work correctly. • Single orchestration task
 Dispatched events, controlled when to start and stop the evaluation and performed log aggregation. • Bottleneck
 Events immediately dispatched: would require blocking for processing acknowledgment. • Unrealistic
 Events do not queue up all at once for processing by the client. 60
  59. Lasp Difficulties • Too expensive
 2.0 CPU and 2048 MiB

    of memory. • Weeks spent adding instrumentation
 Process level, VM level, Erlang Observer instrumentation to identify heavy CPU and memory processes. 61
  60. Lasp Difficulties • Too expensive
 2.0 CPU and 2048 MiB

    of memory. • Weeks spent adding instrumentation
 Process level, VM level, Erlang Observer instrumentation to identify heavy CPU and memory processes. • Dissemination too expensive
 1000 threads to a single dissemination process (one Mesos task) leads to backed up message queues and memory leaks. 61
  61. Lasp Difficulties • Too expensive
 2.0 CPU and 2048 MiB

    of memory. • Weeks spent adding instrumentation
 Process level, VM level, Erlang Observer instrumentation to identify heavy CPU and memory processes. • Dissemination too expensive
 1000 threads to a single dissemination process (one Mesos task) leads to backed up message queues and memory leaks. • Unrealistic
 Two different dissemination mechanisms: thread to thread and node to node: one is synthetic. 61
  62. EPMD Difficulties • Nodes become unregistered
 Nodes randomly unregistered with

    EPMD during execution. • Lost connection
 EPMD loses connections with nodes for some arbitrary reason. 62
  63. EPMD Difficulties • Nodes become unregistered
 Nodes randomly unregistered with

    EPMD during execution. • Lost connection
 EPMD loses connections with nodes for some arbitrary reason. • EPMD task restarted by Mesos
 Restarted for an unknown reason, which leads Lasp instances to restart in their own container. 62
  64. Overhead Difficulties • Too much state
 Client would ship around

    5 GiB of state within 90 seconds. • Delta dissemination
 Delta dissemination only provides around a 30% decrease in state transmission. 63
  65. Overhead Difficulties • Too much state
 Client would ship around

    5 GiB of state within 90 seconds. • Delta dissemination
 Delta dissemination only provides around a 30% decrease in state transmission. • Unbounded queues
 Message buffers would lead to VMs crashing because of large memory consumption. 63
  66. Ditch Distributed Erlang • Pluggable membership service
 Build pluggable membership

    service with abstract interface initially on EPMD and later migrate after tested. 65
  67. Ditch Distributed Erlang • Pluggable membership service
 Build pluggable membership

    service with abstract interface initially on EPMD and later migrate after tested. • Adapt Lasp and Broadcast layer
 Integrate pluggable membership service throughout the stack and librate existing libraries from distributed Erlang. 65
  68. Ditch Distributed Erlang • Pluggable membership service
 Build pluggable membership

    service with abstract interface initially on EPMD and later migrate after tested. • Adapt Lasp and Broadcast layer
 Integrate pluggable membership service throughout the stack and librate existing libraries from distributed Erlang. • Build service discovery mechanism
 Mechanize node discovery outside of EPMD based on new membership service. 65
  69. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. 66
  70. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. • Several protocol implementations: 66
  71. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. • Several protocol implementations: • Full membership via EPMD. 66
  72. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. • Several protocol implementations: • Full membership via EPMD. • Full membership via TCP. 66
  73. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. • Several protocol implementations: • Full membership via EPMD. • Full membership via TCP. • Client-server membership via TCP. 66
  74. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. • Several protocol implementations: • Full membership via EPMD. • Full membership via TCP. • Client-server membership via TCP. • Peer-to-peer membership via TCP (with HyParView) 66
  75. Partisan (Membership Layer) • Pluggable protocol membership layer
 Allow runtime

    configuration of protocols used for cluster membership. • Several protocol implementations: • Full membership via EPMD. • Full membership via TCP. • Client-server membership via TCP. • Peer-to-peer membership via TCP (with HyParView) • Visualization
 Provide a force-directed graph-based visualization engine for cluster debugging in real-time. 66
  76. Partisan (Full via EPMD or TCP) • Full membership
 Nodes

    have full visibility into the entire graph. 67
  77. Partisan (Full via EPMD or TCP) • Full membership
 Nodes

    have full visibility into the entire graph. • Failure detection
 Performed by peer-to-peer heartbeat messages with a timeout. 67
  78. Partisan (Full via EPMD or TCP) • Full membership
 Nodes

    have full visibility into the entire graph. • Failure detection
 Performed by peer-to-peer heartbeat messages with a timeout. • Limited scalability
 Heartbeat interval increases when node count increases leading to false or delayed detection. 67
  79. Partisan (Full via EPMD or TCP) • Full membership
 Nodes

    have full visibility into the entire graph. • Failure detection
 Performed by peer-to-peer heartbeat messages with a timeout. • Limited scalability
 Heartbeat interval increases when node count increases leading to false or delayed detection. • Testing
 Used to create the initial test suite for Partisan. 67
  80. Partisan (Client-Server Model) • Client-server membership
 Server has all peers

    in the system as peers; client has only the server as a peer. 68
  81. Partisan (Client-Server Model) • Client-server membership
 Server has all peers

    in the system as peers; client has only the server as a peer. • Failure detection
 Nodes heartbeat with timeout all peers they are aware of. 68
  82. Partisan (Client-Server Model) • Client-server membership
 Server has all peers

    in the system as peers; client has only the server as a peer. • Failure detection
 Nodes heartbeat with timeout all peers they are aware of. • Limited scalability
 Single point of failure: server; with limited scalability on visibility. 68
  83. Partisan (Client-Server Model) • Client-server membership
 Server has all peers

    in the system as peers; client has only the server as a peer. • Failure detection
 Nodes heartbeat with timeout all peers they are aware of. • Limited scalability
 Single point of failure: server; with limited scalability on visibility. • Testing
 Used for baseline evaluations as “reference” architecture. 68
  84. Partisan (HyParView, default) • Partial view protocol
 Two views: active

    (fixed) and passive (log n); passive used for failure replacement with active view. 69
  85. Partisan (HyParView, default) • Partial view protocol
 Two views: active

    (fixed) and passive (log n); passive used for failure replacement with active view. • Failure detection
 Performed by monitoring active TCP connections to peers with keep-alive enabled. 69
  86. Partisan (HyParView, default) • Partial view protocol
 Two views: active

    (fixed) and passive (log n); passive used for failure replacement with active view. • Failure detection
 Performed by monitoring active TCP connections to peers with keep-alive enabled. • Very scalable (10k+ nodes during academic evaluation)
 However, probabilistic; potentially leads to isolated nodes during churn. 69
  87. Sprinter (Service Discovery) • Responsible for clustering tasks
 Uses Partisan

    to cluster all nodes and ensure connected overlay network: reads information from Marathon. 70
  88. Sprinter (Service Discovery) • Responsible for clustering tasks
 Uses Partisan

    to cluster all nodes and ensure connected overlay network: reads information from Marathon. • Node local
 Operates at each node and is responsible for taking actions to ensure connected graph: required for probabilistic protocols. 70
  89. Sprinter (Service Discovery) • Responsible for clustering tasks
 Uses Partisan

    to cluster all nodes and ensure connected overlay network: reads information from Marathon. • Node local
 Operates at each node and is responsible for taking actions to ensure connected graph: required for probabilistic protocols. • Membership mode specific
 Knows, based on the membership mode, how to properly cluster nodes and enforces proper join behaviour. 70
  90. Debugging Sprinter • S3 archival
 Nodes periodically snapshot their membership

    view for analysis. • Elected node (or group) analyses 
 Periodically analyses the information in S3 for the following: 71
  91. Debugging Sprinter • S3 archival
 Nodes periodically snapshot their membership

    view for analysis. • Elected node (or group) analyses 
 Periodically analyses the information in S3 for the following: • Isolated node detection
 Identifies isolated nodes and takes corrective measures to repair the overlay. 71
  92. Debugging Sprinter • S3 archival
 Nodes periodically snapshot their membership

    view for analysis. • Elected node (or group) analyses 
 Periodically analyses the information in S3 for the following: • Isolated node detection
 Identifies isolated nodes and takes corrective measures to repair the overlay. • Verifies symmetric relationship
 Ensures that if a node knows about another node, the relationship is symmetric: prevents I know you, but you don’t know me. 71
  93. Debugging Sprinter • S3 archival
 Nodes periodically snapshot their membership

    view for analysis. • Elected node (or group) analyses 
 Periodically analyses the information in S3 for the following: • Isolated node detection
 Identifies isolated nodes and takes corrective measures to repair the overlay. • Verifies symmetric relationship
 Ensures that if a node knows about another node, the relationship is symmetric: prevents I know you, but you don’t know me. • Periodic alerting
 Alerts regarding disconnected graphs so external measures can be taken, if necessary. 71
  94. Evaluation Strategy • Deployment and runtime configuration
 Ability to deploy

    a cluster of node and configure simulations at runtime. 73
  95. Evaluation Strategy • Deployment and runtime configuration
 Ability to deploy

    a cluster of node and configure simulations at runtime. • Each simulation: 73
  96. Evaluation Strategy • Deployment and runtime configuration
 Ability to deploy

    a cluster of node and configure simulations at runtime. • Each simulation: • Different application scenario
 Uniquely execute a different application scenario at runtime based on runtime configuration. 73
  97. Evaluation Strategy • Deployment and runtime configuration
 Ability to deploy

    a cluster of node and configure simulations at runtime. • Each simulation: • Different application scenario
 Uniquely execute a different application scenario at runtime based on runtime configuration. • Result aggregation
 Aggregate results at end of execution and archive these results. 73
  98. Evaluation Strategy • Deployment and runtime configuration
 Ability to deploy

    a cluster of node and configure simulations at runtime. • Each simulation: • Different application scenario
 Uniquely execute a different application scenario at runtime based on runtime configuration. • Result aggregation
 Aggregate results at end of execution and archive these results. • Plot generation
 Automatically generate plots for the execution and aggregate the results of multiple executions. 73
  99. Evaluation Strategy • Deployment and runtime configuration
 Ability to deploy

    a cluster of node and configure simulations at runtime. • Each simulation: • Different application scenario
 Uniquely execute a different application scenario at runtime based on runtime configuration. • Result aggregation
 Aggregate results at end of execution and archive these results. • Plot generation
 Automatically generate plots for the execution and aggregate the results of multiple executions. • Minimal coordination 
 Work must be performed with minimal coordination, as a single orchestrator is a scalability bottleneck for large applications. 73
  100. Completion Detection • “Convergence Structure”
 Uninstrumented CRDT of grow-only sets

    containing counters that each node manipulates. • Simulates a workflow
 Nodes use this operation to simulate a lock-stop workflow for the experiment. 74
  101. Completion Detection • “Convergence Structure”
 Uninstrumented CRDT of grow-only sets

    containing counters that each node manipulates. • Simulates a workflow
 Nodes use this operation to simulate a lock-stop workflow for the experiment. • Event Generation
 Event generation toggles a boolean for the node to show completion. 74
  102. Completion Detection • “Convergence Structure”
 Uninstrumented CRDT of grow-only sets

    containing counters that each node manipulates. • Simulates a workflow
 Nodes use this operation to simulate a lock-stop workflow for the experiment. • Event Generation
 Event generation toggles a boolean for the node to show completion. • Log Aggregation
 Completion triggers log aggregation. 74
  103. Completion Detection • “Convergence Structure”
 Uninstrumented CRDT of grow-only sets

    containing counters that each node manipulates. • Simulates a workflow
 Nodes use this operation to simulate a lock-stop workflow for the experiment. • Event Generation
 Event generation toggles a boolean for the node to show completion. • Log Aggregation
 Completion triggers log aggregation. • Shutdown
 Upon log aggregation completion, nodes shutdown. 74
  104. Completion Detection • “Convergence Structure”
 Uninstrumented CRDT of grow-only sets

    containing counters that each node manipulates. • Simulates a workflow
 Nodes use this operation to simulate a lock-stop workflow for the experiment. • Event Generation
 Event generation toggles a boolean for the node to show completion. • Log Aggregation
 Completion triggers log aggregation. • Shutdown
 Upon log aggregation completion, nodes shutdown. • External monitoring
 When events complete execution, nodes automatically begin the next experiment. 74
  105. Results Lasp • Single node orchestration: bad
 Not possible once

    you exceed a few nodes: message queues, memory, delays. 76
  106. Results Lasp • Single node orchestration: bad
 Not possible once

    you exceed a few nodes: message queues, memory, delays. • Partial Views
 Required: rely on transitive dissemination of information and partial network knowledge. 76
  107. Results Lasp • Single node orchestration: bad
 Not possible once

    you exceed a few nodes: message queues, memory, delays. • Partial Views
 Required: rely on transitive dissemination of information and partial network knowledge. • Results
 Reduced Lasp memory footprint to 75MB; larger in practice for debugging. 76
  108. Results Partisan • Fast churn isolates nodes
 Need a repair

    mechanism: random promotion of isolated nodes; mainly issues of symmetry. 77
  109. Results Partisan • Fast churn isolates nodes
 Need a repair

    mechanism: random promotion of isolated nodes; mainly issues of symmetry. • FIFO across connections
 Not per connection, but protocol assumes across all connections leading to false disconnects. 77
  110. Results Partisan • Fast churn isolates nodes
 Need a repair

    mechanism: random promotion of isolated nodes; mainly issues of symmetry. • FIFO across connections
 Not per connection, but protocol assumes across all connections leading to false disconnects. • Unrealistic system model
 You need per message acknowledgements for safety. 77
  111. Results Partisan • Fast churn isolates nodes
 Need a repair

    mechanism: random promotion of isolated nodes; mainly issues of symmetry. • FIFO across connections
 Not per connection, but protocol assumes across all connections leading to false disconnects. • Unrealistic system model
 You need per message acknowledgements for safety. • Pluggable protocol helps debugging
 Being able to switch to full membership or client-server assists in debugging protocol vs. application problems. 77
  112. Latest Results • Reproducibility at 300 nodes for full applications


    Connectivity, but transient partitions and isolated nodes at 500 - 1000 nodes (across 140 instances.) 78
  113. Latest Results • Reproducibility at 300 nodes for full applications


    Connectivity, but transient partitions and isolated nodes at 500 - 1000 nodes (across 140 instances.) • Limited financially and by Amazon
 Harder to run larger evaluations because we’re limited financially (as a university) and because of Amazon limits. 78
  114. Latest Results • Reproducibility at 300 nodes for full applications


    Connectivity, but transient partitions and isolated nodes at 500 - 1000 nodes (across 140 instances.) • Limited financially and by Amazon
 Harder to run larger evaluations because we’re limited financially (as a university) and because of Amazon limits. • Mean state reduction per client
 Around 100x improvement from our PaPoC 2016 initial evaluation results. 78
  115. Plat à emporter • Visualizations are important!
 Graph performance, visualize

    your cluster: all of these things lead to easier debugging. 79
  116. Plat à emporter • Visualizations are important!
 Graph performance, visualize

    your cluster: all of these things lead to easier debugging. • Control changes
 No Lasp PR accepted without divergence, state transmission, and overhead graphs. 79
  117. Plat à emporter • Visualizations are important!
 Graph performance, visualize

    your cluster: all of these things lead to easier debugging. • Control changes
 No Lasp PR accepted without divergence, state transmission, and overhead graphs. • Automation
 Developers use graphs when they are easy to make: lower the difficulty for generation and understand how changes alter system behaviour. 79
  118. Plat à emporter • Visualizations are important!
 Graph performance, visualize

    your cluster: all of these things lead to easier debugging. • Control changes
 No Lasp PR accepted without divergence, state transmission, and overhead graphs. • Automation
 Developers use graphs when they are easy to make: lower the difficulty for generation and understand how changes alter system behaviour. • Make work easily testable
 When you test locally and deploy globally, you need to make things easy to test, deploy and evaluate (for good science, I say!) 79