Distributed consensus is often discussed in terms of algorithms: Paxos, ZAB, RAFT, etc. But while the algorithms may be more or less mind-bending, for me the more interesting aspect of distributed consensus is creating systems that support it for the general use case. This paper, on Google's Chubby lock service, is the story of happens when a system stops being a polite theory, and starts getting real-world use.
To anyone who has worked in depth as a distributed systems engineer, Chubby is a beautiful paper. It is not a paper about algorithms and their limits, or a toy fringe system created by grad students to test a hypothesis. It is a paper that describes the real tradeoffs that real systems engineers make when designing something to solve a large set of problems well enough. This paper shows the key insights that the authors had as to how such a system might be used, and awareness of what it should do well, and what it should not try to do well. It details how Chubby was designed, but then goes further to describe how it ended up being used when released to the wild, and the surprises and consequences of these design decisions.