Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Data Synchronization with CRDTs (Stra...

Dmitry Ivanov
September 16, 2016

Practical Data Synchronization with CRDTs (StrangeLoop, 2016)

In a connected world, synchronising mutable information between different devices with different clock precision can be a difficult problem. A piece of data may have many out-of-sync replicas but all of those should eventually be in a consistent state. For example, TomTom users, having personal navigation devices, smartphones, MyDrive website accounts, expect their navigation information be synchronised properly even in the occasional absence of network connection. Conflict-free Replicated Data Types (CRDTs) provide robust data structures to achieve proper synchronisation in an unreliable network of devices. They enable the conflict resolution being done locally at the data type level while guaranteeing the eventual consistency between replicas.

In addition to an introduction to common CRDT types, the main focus is on the special subtype of CRDT-Set called OUR-Set (Observed, Updated, Removed), which we created to extend known CRDT sets with update functionality.

I will demonstrate basic implementations of various CRDTs in Scala and enumerate subtle considerations which should be taken into account. I will also explain the advantages of these data structures to solve many synchronisation problems as well as their limitations.

Dmitry Ivanov

September 16, 2016
Tweet

More Decks by Dmitry Ivanov

Other Decks in Programming

Transcript

  1. Disclaimer I'm NOT: • Distributed systems expert. • Hardcore academia

    guy. Just a curious engineer hacking on real world problems. 2
  2. Who We Are "Fool" stack developers hacking on: • Backend

    services • Mobile || SDKs • Infrastructure && AWS && DevOps 4
  3. NavCloud Nature • Unstable connec,ons • Limited bandwidth • Seamless

    edit/view in offline mode • Concurrent changes with poten7al conflicts • No guarantee on updates order • No data loss • Data convergence to expected value 7
  4. CRDT R: Replicated CRDT is a family of data structures

    which has been designed to be distributed 12
  5. What is Merge? • A binary opera-on on two CRDTs

    • Commuta've: x • y = y • x • Associa've: ( x • y ) • z = x • ( y • z ) • Idempotent: x • x = x 16
  6. How Does it Help? In Distributed Systems: • Order is

    not guaranteed: • No Problem: Merge is Commuta-ve and Associa-ve • Events can be delivered more than once: • No problem: Merge is Idempotent 17
  7. What Does it Bring in Prac1ce? • Local updates •

    Local merge of receiving data • All local merges converge 18
  8. Max Func)on • A binary opera-on on two CRDTs •

    Commuta've: x max y = y max x • Associa've: ( x max y ) max z = x max ( y max z ) • Idempotent: x max x = x 22
  9. G-Set Merge: Union of sets: { x, y, z, a,

    b, c } Total Value: The same as the merge result 24
  10. Union Func)on • A binary opera-on on two CRDTs •

    Commuta've: x ∪ y = y ∪ x • Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z ) • Idempotent: x ∪ x = x 25
  11. Problems • Unstable connec-ons • Actual update -me < Sent

    -me • Network latency • Sent -me < Received -me • Unreliable clocks 30
  12. NavCloud Nature vs CRDT • Unstable connec,ons ✔ • Limited

    bandwidth ✔ • Seamless edit/view in offline mode ✔ • Concurrent changes with poten7al conflicts ✔ • No guarantee on updates order ✔ • No data loss ✔ • Data convergence to expected value ✔ 34
  13. Implemen'ng a CRDT Set What do we want? • Support

    for addi-on and removal opera-ons. • Op-mized for element muta-ons. • Footprint as compact as possible. 37
  14. 2-Phase-Set Supports addi,ons and removals. • G-Set for added elements

    • G-Set for removed elements aka Tombstones 38
  15. 2-Phase-Set Merge: [ Add { "cat", "dog", "ape" }; Rem

    { "ape" } ] Lookup: { "cat", "dog" } 40
  16. 2-Phase-Set Lookup def lookup: Set[E] = addSet.diff(removeSet).lookup Merge def merge(anotherSet:

    TwoPSet[E]): TwoPSet[E] = new TwoPSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 41
  17. 2-Phase-Set Doesn't work for us: • Removed element can't be

    added again • Immutable elements: no updates possible 42
  18. LWW-Element-Set Supports addi,ons and removals, with !mestamps. • G-Set for

    added elements • G-Set for removed elements aka Tombstones • Each element has a 3mestamp • Supports re-adding removed element using a higher 3mestamp 43
  19. LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"),

    (1, "ape") } Rem { (1, "cat"), (3, "cat") } 45
  20. LWW-Element-Set Merge Add { (1, "cat"), (5, "cat"), (1, "dog"),

    (1, "ape") } Rem { (1, "cat"), (3, "cat") } Lookup { "cat", "dog", "ape" } 46
  21. LWW-Element-Set Lookup def lookup: Set[E] = addSet.lookup.filter { addElem =>

    !removeSet.exists { removeElem => removeElem.value == addElem.value && removeElem.timestamp > addElem.timestamp } }.map(_.value) Merge def merge(LWWSet<E> anotherSet): LWWSet<E> = new LWWSet( union(addset, anotherSet.addSet ), union(removeSet, anotherSet.removeSet )) 47
  22. OR-Set OR - Observed / Removed Supports addi,ons and removals,

    with tags. • G-Set for added elements • G-Set for removed elements aka Tombstones • Unique tag is associated with each element • Supports re-adding removed elements 49
  23. OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"),

    (#d, "ape") } Rem { (#a, "cat") } 51
  24. OR-Set Merge Add { (#a, "cat"), (#c, "cat"), (#b, "dog"),

    (#d, "ape") } Rem { (#a, "cat") } Lookup { "cat", "dog", "ape" } 52
  25. OR-Set Lookup E exists iff it has in AddSet a

    tag that is not in the RemoveSet. def lookup(): Set<E> = addSet.filter { addElem => !removeSet.exists { remElem => addElem.value == remElem.value && remElem.tag.equals(addElem.tag) } } .map(_.value); 53
  26. OR-Set Merge def merge(anotherSet: ORSet[E]): ORSet[E] = new ORSet( union(addset,

    anotherSet.addSet ), union(removeSet, anotherSet.removeSet)) 54
  27. OUR-Set Our take on Observed-Updated-Removed Set • Each element has

    a unique iden%fier • Element can be changed if iden4fier remains the same • Each element has a %mestamp • Timestamp is updated on each element muta4on Iden%ty (immutable unique id) vs Value (mutable) 56
  28. OUR-Set Contains a single underlying set of elements with metadata:

    • Each element has a unique id field (e.g. a UUID) • Each element has a "removed" boolean flag • Each element has a )mestamp • Set can only contain one element with a par'cular id 57
  29. OUR-Set Merge: { (id1, 5, "*ger"), (id2, 2, "dog", removed),

    (id3, 1, "ape") } Lookup { "$ger", "ape" } 60
  30. OUR-Set Merge def merge(anotherSet: OURSet[E]]): OURSet[E] = OURSet[E]( elements ++

    anotherSet.elements) .groupBy (_.id) .map (group => group._2.maxBy(_.timestamp)) .toSet) Lookup def lookup(ourSet: OURSet[E]): Set[E] = ourSet.filter (!_.removed) .map (_.value) 61
  31. CRDT Model: Favorites FavoriteState element: • ID (to uniquely iden.fy

    a favorite) • Timestamp (to indicate the last change .me) • Removed flag (to indicate if favorite has been removed) • Favorite data: ( Name, Loca2on, ... ) 63
  32. Convergence in case of equal !mestamps Compare func-on checks all

    the fields in order of priority: • Timestamp • Removed flag (Add or Delete bias) • .. rest a6ributes .. 64
  33. Using CRDT everywhere Client <-> Server <-> Database def update(fromClient:

    OURSet[E]): OURSet[E] = { val fromDatabase = database.fetch(...) val newSet = fromDatabase.merge(fromClient) database.store(..., newSet) newSet } 66
  34. 67

  35. "What about garbage?" • CRDTs tend to grow because of

    tombstones. • Deleted Element in the Set == Tombstone. • A poten?ally unbounded growth. 69
  36. Prune deleted elements But when? Requirement: All nodes holding a

    CRDT Set replica should have seen a deleted element before it can be pruned. Otherwise deleted elements can be resurrected. 70
  37. Time-To-Live for tombstones Prune tombstones once TTL exceeded. if ((DateTime.now()

    - tombstone.timestamp) > TimeToLive) { crdtSet.remove(tombstone) } Requirement: all nodes holding a CRDT set should apply the same TTL rule independently. 71
  38. Send and reply with a Diff Client modifies and sends

    only updated elements (Diff). Before: Server responds with a full merge result. 72
  39. Send and reply with a Diff We introduced a 'Scoped

    Diff': Server responds only with the elements which have won against those sent by the client. 73
  40. Ordering updates within a single node Timestamp field as a

    logical clock. Absolute value is not important, but it should always grow monotonically. 80
  41. Ordering updates within a single node "+1 Strategy" (aka ensure

    monotonicity): Long resolveNewTimestamp(ElementState<E> state) { return Math.max( retrieveTimestamp(), state.lastModified() + 1 ); } 81
  42. Ordering updates from different nodes If GPS clock is available

    -> use it (mainly Naviga&on Devices case). Prefer the server &me to a client's local 0me. 82
  43. Clients & Server MUST have same 'merge' behaviour. == Given

    the same input, their 'merge' func/ons emit the same results. 85
  44. Lazy (data) loading New OURSet Element • Metadata: UUID, $mestamp,

    "removed" flag, + tag / hash • (Op(onal) Data: <Value> Flexible synchroniza1on strategy Eager || Lazy Fetch 88
  45. What have we learned? • Academia is not as scary

    as it some-mes seems to pragma,c devs. • We need be2er and simpler abstrac-ons to develop Offline-friendly apps. • CRDTs give a great value, but there are some caveats. • Things like Lasp (lasp-lang.org) also could be the answer (?). 89
  46. Want to know more? (Part 1) • CRDTs for fun

    and eventual profit, - Noel Welsh, 2013. • Readings in conflict-free replicated data types, - Christopher Meiklejohn, 2015. • A comprehensive study of Convergent and CommutaJve Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011. 91
  47. Want to know more? (Part 2) • Lasp: A language

    for distributed, coordina7on-free programming, - Meiklejohn & Van Roy, 2015. • Swarm.js+React — real-7me, offline-ready Holy Grail web apps, - Victor Grishchenko. 92