Of interest in consensus protocols Of new programming languages and paradigms to deal with the complexity of distributed Of databases! Thursday, May 16, 13
despite sparse documentation: Multiple large production deployments (Yahoo, OpenX, StackMob) Used in a few university systems classes Thursday, May 16, 13
we prove correctness of existing distributed systems? Unit tests grossly insufficient for large distributed systems QuickCheck is an improvement “testing only shows the presence, not the absence of bugs” - Dijkstra Thursday, May 16, 13
work fine: unit tests passed, Riak integration tests passed A day’s worth of QuickCheck testing revealed bugs in every major piece of functionality Thursday, May 16, 13
evolves separately from tests Large up-front effort, tests decay over time See: “Hansei: Property-based Development of Concurrent Systems” by Joe Blomstedt of Basho Unify model and production code with annotations McErlang does exhaustive state-space exploration Thursday, May 16, 13
now! Dings in the armor of OO Explosion of new languages Let’s use existing tools like compilers, static analysis to verify our programs Thursday, May 16, 13
sys.ip = processes.ip AND (rss*100)/sys.memtotal > 75 AND sys.ip in (SELECT ip FROM machinerole WHERE role=’dns’); Akamai “Query” System *Keeping Track of 70,000+ Servers: The Akamai Query System Thursday, May 16, 13
manure into one bucket” - Scott Fritchie’s Grandfather “microbursts” of traffic sent to one cluster member Coordinator sends request to three replicas All respond with large-ish result at roughly the same time Switch has to either buffer or drop packets Result: throughput collapse Thursday, May 16, 13
like Paxos? Or are they just fundamentally complex and we need to deal with it? Do other disciplines have anything to teach us about new/richer models? Thursday, May 16, 13