• Assumes crash-stop fault model • In production not all faults are crash-stop failures • Bugs in code, OS, etc. can corrupt the system state • Unlike crash-stop faults state corruptions are not detectable by clients • Hard to determine if system is functioning reliably until users notice it
but they are expensive • Use fault detection instead as it is cheaper • This paper presents ◦ ZooKeeper's data inconsistency detection mechanisms ◦ Their impact on the overall performance
configuration errors, and corrupted data • Does not cover malicious-byzantine faults • Two types of replica ◦ Correct replica - consistent ◦ Faulty replica - inconsistent
snapshot and transaction log files of all replicas • Deserialize DataTree of every replica • Compute the digest of each DataTree • Compare the digests to find diverged replica
in ZooKeeper service • Every replica maintains a replica digest and a digest log (historical digests) • Replica digest ◦ Represents the state of replica up to a particular transaction id ◦ Updated upon a change to its DataTree ◦ Incremental Hash (AdHASH) is used to compute digests efficiently • Replica digest is added to the digest log after every fixed number of transactions • External auditor is employed to compare historical digests
− Digest(OldNodeData) + Digest(NewNodeData) • When creating a new node, the Digest(OldNodeData) will be 0 • When deleting a node, the Digest(NewNodeData) will be 0 *Mihir Bellare and Daniele Micciancio. 1997. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Advances in Cryptology—EUROCRYPT’97: International Conference on the Theory and Application of Cryptographic Techniques Konstanz, Germany, May 11–15, 1997 Proceedings 16. Springer, 163–192.
it happens, i.e., as the replica state changes • Both digest computation and comparisons are embedded in ZooKeeper • Two types of digest ◦ Replica Digest ◦ Predictive Digest
60B (Stats) • Async repeated read (getData) and write(setData) • At most 100 outstanding request per client • Clients remain connected to the same server • 900 clients - 300 clients per server
function - low penalty and low confidence in detection • Strong hash function - high penalty and high confidence in detection Performance with different hash functions in Online Comparison Performance with different hash functions in Realtime Detection
an acceptable performance penalty ▪ Online Comparison - 2% with 100% write operations and CRC-32 ▪ Realtime Detection - 20% with 100% write operations and CRC-32 • We are able to detect data inconsistencies with no increase in deployment cost and only minor development and performance cost
to download/copy replica data ◦ Auditor helps to catch inconsistency sooner than the external consistency checker ◦ Provides a context around the time when fault occurred, which helps to identify the root cause of faults • Disadvantages ◦ Impacts the overall performance of ZooKeeper ▪ replicas compute digest on every commit which adds an extra CPU load ◦ Doesn't provide complete protection against the propagation of corruption
Has no impact no ZooKeeper's performance • Disadvantages ◦ Inefficient ▪ Need to copy replica data every time consistency checker runs ▪ Can't detect faults unless external consistency checker runs ▪ Doesn't protect against the propagation of corruption
they occur (as data is changing) ◦ This helps in preventing the propagation of corruption ◦ Provides very specific context around the time when fault occurred ▪ this helps in RCA of arbitrary behavior • Disadvantages ◦ Impacts overall performance of the system ◦ Replicas compute digest on every commit which adds an extra CPU load on all replicas ◦ Computation of predictive digests adds an additional load on the CPU of the leader server ▪ This slows down all the state modifying operations