$30 off During Our Annual Pro Sale. View Details »

CAP Theorem: 
not what we thought it was, not what we are looking for

Shlomi Noach
December 18, 2019

CAP Theorem: 
not what we thought it was, not what we are looking for

The CAP theorem is often used to classify distributed systems, and the "two out of three" rule is often quoted. But the CAP theorem is widely misunderstood. What are the exact terms of the CAP Theorem? How does it differ from Brewer's original CAP Conjecture? Where does CAP fall short of meeting practical engineering expectations?

Shlomi Noach

December 18, 2019
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. CAP Theorem: 

    not what we thought it was,
    not what we are looking for
    Shlomi Noach
    GitHub
    DevOpsDays TLV 2019

    View Slide

  2. About me
    @github/database-infrastructure
    Author of orchestrator, gh-ost, freno, ccql
    and others.
    Blog at http://openark.org

    github.com/shlomi-noach

    @ShlomiNoach

    View Slide

  3. GitHub

    Built for developers
    Largest open source hosting
    40M+ developers

    2.9M+ organizations

    100M+ repositories,
    Supplier of octocat T-Shirts and stickers

    View Slide

  4. Incentive
    Important part of our work is to keep the service available, so
    that users/customers have access to their data and can operate
    on their data.

    View Slide

  5. CAP Conjecture
    Suggested by Eric Brewer, 1998-1999
    Terms:
    • [strong] Consistency
    • [high] Availability
    • Partition tolerance
    Strong Consistency, High Availability, Partition-resilience: Pick
    at most 2.
    C
    A P

    View Slide

  6. CAP Theorem
    Brewer’s Conjecture and the Feasibility of Consistent,
    Available, Partition-Tolerant Web Services
    Seth Gilbert, Nancy Lynch

    View Slide

  7. CAP Theorem
    A mathematical proof
    Uses different definition of Availability than CAP Conjecture’s
    definition
    Neither of the Availability definitions contain one another (or,
    are stronger than the other)

    View Slide

  8. CAP Theorem
    In Math, a theorem only holds as long as its terms & conditions
    are met.
    CAP Theorem does not prove the CAP Conjecture.

    View Slide

  9. CAP Theorem
    Terms, according to Gilbert & Lynch:
    - Consistency
    - Availability
    - Partition tolerance

    View Slide

  10. CAP Theorem: Consistency
    Once a write is successful on a node, any read on any node must
    reflect that write or any later write.
    aka Atomic Consistency, aka Strong consistency, aka Linear
    consistency, aka Linearizability.
    C

    View Slide

  11. CAP Theorem: Availability
    …every request received by a non-failing node in the system
    must result in a response.
    There is no constraint on the actual amount of time.
    Though it is not specified, it is implied that response must be
    valid, non-error.
    Contrast with Brewer’s definition of High availability: data is
    considered highly available if a given consumer of the data can
    always reach some replica.
    A

    View Slide

  12. CAP Theorem: Partition Tolerance
    The system is able to operate on network partitioning.
    Partition tolerance is considered as a given condition, since
    network partitioning can and does take place regardless of a
    system’s design.
    P

    View Slide

  13. CAP Theorem
    A distributed data store [web service] cannot provide more than
    two out of the three properties. Better illustrated as:
    • If the network is good, you may achieve both Availability and
    Consistency.
    • If the network is partitioned, you must choose between
    Availability and Consistency.

    View Slide

  14. CAP Theorem
    The discussion is mostly about CP vs AP systems
    C
    A P

    View Slide

  15. Proof of CAP theorem
    Simplified, not by much.
    Given two nodes replicating from each other
    ! !
    n1 n2

    View Slide

  16. Proof of CAP theorem
    We partition the network between the two nodes to an infinite
    amount of time.
    ! !
    n1 n2

    View Slide

  17. Proof of CAP theorem
    We write data to one node. If the system is Available, the write
    completes in a finite amount of time.
    ! !
    n1 n2

    View Slide

  18. Proof of CAP theorem
    We read data from the other node. If the system is Available
    that read completes in a finite mount of time.
    ! !
    n1 n2

    View Slide

  19. Proof of CAP theorem
    During that finite time the network was partitioned, hence our
    read could not reflect changes made by the write.
    QED
    ! !
    n1 n2

    View Slide

  20. The proof is mathematical
    Following solid Mathematical principles.

    View Slide

  21. The proof is mathematical
    Reading data after writing is only possible when the write time is
    finite.
    Proves impossibility by counter example: we’ve shown a (single)
    scenario where we can’t have both Availability and Consistency.

    View Slide

  22. CAP Theorem
    CAP does not say “you cannot have both A and C at the same
    time” *
    It says: “you cannot design a system where you will have both A
    and C together at all times” *
    * Assuming P

    View Slide

  23. All purple dragons can fly

    View Slide

  24. Vacuous truth
    A statement that asserts that all members of the empty set
    have a certain property.
    [wikipedia]

    View Slide

  25. Vacuous truth
    All purple dragons can fly.

    View Slide

  26. Vacuous truth
    All purple dragons can fly faster than the speed of light.

    View Slide

  27. Vacuous truth
    All purple dragons can answer questions.
    All purple dragons cannot answer questions.

    View Slide

  28. CAP Availability
    “…every request received by a non-failing node in the system
    must result in a response.”

    View Slide

  29. CAP Availability
    It is vacuous truth, that if all the nodes in my system are
    crashed, then my system is Available.

    View Slide

  30. Proposal to making my service
    available
    ! !
    !
    !
    !
    ! Shut down all database servers?

    View Slide

  31. Proposal to making my service
    available
    ! !
    !
    !
    !
    ! Shut down all database servers?
    The CAP Theorem Availability definition goes against practical
    definition of availability.

    View Slide

  32. Infinite network partitioning

    View Slide

  33. Infinite network partitioning
    Illustrated

    View Slide

  34. Meet Anna and Ben,
    SRE’s at example.com

    View Slide

  35. View Slide

  36. One day, Anna's
    attention is drawn
    to what seems to be
    an unfolding crisis.
    oh, dear!

    View Slide

  37. Ben, are you seeing
    what I’m seeing?

    View Slide

  38. The two investigate
    It seems like our
    Virginia router is
    malfunctioning

    View Slide

  39. Confirmed with the
    datacenter; they saw
    some smoke, and
    then it burned. It's
    literally melted.

    View Slide

  40. So our Virginia DC is
    now network
    isolated.

    View Slide

  41. You know what
    that means?

    View Slide

  42. oh, wow!
    oh, wow!

    View Slide

  43. The router is never
    coming back. This is
    an infinite network
    partitioning!

    View Slide

  44. We will never have
    consistency ever
    again!

    View Slide

  45. There's just no point 

    in anything. CAP 

    Theorem proves this.

    View Slide

  46. For eternity!

    View Slide

  47. We may as well enjoy our
    time! We can live A life of
    adventure!

    View Slide

  48. We can go to

    Paris!

    View Slide

  49. We can go to

    Rome!

    View Slide

  50. We can go to

    DevOpsDays TLV!

    View Slide

  51. Woohoo!

    View Slide

  52. Enters Carmen, CTO
    for example.com

    View Slide

  53. Hey there! What’s

    going on? Seems like

    we have an outage?

    View Slide

  54. Our Virginia router

    burst up in flames.

    It is gone.

    View Slide

  55. For eternity. We now

    have an infinite 

    network partitioning, 

    and will never again 

    have consistency.

    View Slide

  56. There's just no point 

    in anything. CAP 

    Theorem proves this.

    View Slide

  57. Can we expense 

    travel to 

    DevOpsDays TLV?

    View Slide


  58. View Slide


  59. View Slide


  60. View Slide

  61. ben?

    View Slide

  62. Yeah?

    View Slide

  63. Here's my corporate
    credit card. 

    I'd like you to go to
    the store downtown,
    buy a new router, take
    it to our Virginia
    DataCenter and replace
    the old router.

    View Slide

  64. gulp!

    View Slide

  65. Yeah, 

    that also works!

    View Slide

  66. Infinite network partitioning
    Is not a practical situation.
    Is not something we plan for.
    Is not something that concerns us as we engineer our systems.

    View Slide

  67. Can we trade “infinite” with 

    “long enough”?
    This would break the proof. If a network partition is capped (pun
    intended) by t seconds we can stall any read by t+α,
    conveniently delaying beyond the network partitioning, allowing
    for consistency.
    Remember: there is no cap (pun again) to query response time.
    ! !
    n1 n2

    View Slide

  68. Can we rewrite CAP Theorem in
    terms of finite timeouts?
    Yes:
    • in a system where a network partition can last more than t
    • and where we require queries to respond within time t/2
    • We can illustrate a proof similar to CAP Theorem’s where
    given such partitioning we cannot achieve both availability
    and consistency.
    • Let’s name it “Time Limited CAP”
    ! !
    n1 n2

    View Slide

  69. Questioning “time limited CAP”
    Is t a given constraint? Why would we necessarily require
    queries to complete in t/2?
    Is there system logic / customer logic to the above, or have we
    tailored this to fit a theorem we had?

    View Slide

  70. Questioning “time limited CAP”
    Opinion: this chase is a fallacy. We seem to be trying to prove a
    theorem while we disagree with its terms, namely Availability.

    View Slide

  71. CAP Conjecture
    Availability: data is considered highly available if a given
    consumer of the data can always reach some replica.
    Can we work with that?

    View Slide

  72. n == 2?

    View Slide

  73. n > 2
    Availability: consumer always has access to data via at least
    one replica.
    Some node will have our data (we will likely need to detect
    which one)
    ! !
    !
    !
    !

    View Slide

  74. CAP is short of CAPacity

    View Slide

  75. CAP is short of CAPacity™

    View Slide

  76. CAP is short of CAPacity™®

    View Slide

  77. CAP is short of CAPacity™®
    PATENTED

    View Slide

  78. CAP Conjecture, capacity
    What if we agreed to quorum?
    q-Availability: assuming number of replicas is n, consumer
    always has access to data via at least ⌊n/2 + 1⌋ replicas.
    e.g.
    - in a 5-replicas cluster, quorum would be 3

    - in a 7-replicas cluster, quorum would be 4
    ! !
    !
    !
    !

    View Slide

  79. Consensus algorithms: 

    Paxos, Raft

    View Slide

  80. Consensus algorithms: 

    Paxos, Raft
    Depending on implementation, can guarantee:
    • A write is readily Available to read on quorum nodes
    • A write is made durable on quorum nodes
    ! !
    !
    !
    !

    View Slide

  81. ! !
    !
    !
    !
    Do consensus algorithms
    contradict CAP?

    View Slide

  82. ! !
    !
    !
    !
    CAP network partitioning, n > 2

    View Slide

  83. ! !
    !
    !
    !
    CAP network partitioning, n > 2

    View Slide

  84. ! !
    !
    !
    !
    CAP network partitioning, n > 2

    View Slide

  85. ! !
    !
    !
    !
    Reasonable confidence

    View Slide

  86. CAP: recap
    Assuming P, you cannot design a system where you will have
    both A and C together at all times.
    Proven by giving a [single] counter example.

    View Slide

  87. Absolutism

    View Slide

  88. High Availability
    Commonly measured in 9’s.
    We agree that absolute availability is unrealistic.

    View Slide

  89. CAP 9’s?
    “In practice, many applications are best described in terms of
    reduced consistency or availability…
    … there is a Weak CAP Principle which we have yet to
    characterize precisely…"

    View Slide

  90. CAP is a subset
    Availability and Consistency are but two (important) properties
    of modern distributed systems. Geo distributions, transactions,
    latency, durability, consistent distributed snapshots, DR,
    failovers, backup options, mean time to recover, observability,
    operability…(some properties correlate)
    Which are important to your service?

    View Slide

  91. The tradeoff exists
    We understand this as engineers. Multiple other tradeoffs exist.
    I believe CAP is not the model we should be looking for.

    View Slide

  92. Harvest, Yield, and Scalable Tolerant Systems, Armando Fox, Eric A. Brewer

    https://pdfs.semanticscholar.org/5015/8bc1a8a67295ab7bce0550886a9859000dc2.pdf
    Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web
    Services, Seth Gilbert, Nancy Lynch

    https://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf
    A Critique of the CAP Theorem, Martin Kleppmann

    https://arxiv.org/pdf/1509.05393.pdf
    CAP Twelve Years Later: How the "Rules" Have Changed, Eric Brewer

    https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
    Please stop calling databases CP or AP, Martin Kleppmann

    https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
    Further reading

    View Slide

  93. NewSQL database systems are failing to guarantee consistency, and I blame Spanner, Daniel
    Abadi

    http://dbmsmusings.blogspot.com/2018/09/newsql-database-systems-are-failing-to.html
    Problems with CAP, and Yahoo’s little known NoSQL system, Daniel Abadi

    http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html
    PACELC theorem, Wikipedia

    https://en.wikipedia.org/wiki/PACELC_theorem
    "A Critique of the CAP Theorem”, Julia Evans

    https://jvns.ca/blog/2016/11/19/a-critique-of-the-cap-theorem/
    CAP Theorem, FoundationDB

    https://apple.github.io/foundationdb/cap-theorem.html
    Further reading

    View Slide

  94. Questions?
    github.com/shlomi-noach
    @ShlomiNoach
    Thank you!

    View Slide