2i, links, pre- and post-commit hooks, pluggable backends, HTTP and binary interfaces • Written in Erlang with C/C++ • Open source under Apache 2 License Riak Thursday, September 13, 12
Deployed in production the same year • Used at data store for Basho’s SaaS • Open sourced in August 2009; Basho “pivots” • Hit v.1.0 in September 2011 • Now being used by 1000s in production • Basho sells commercial extensions to Riak Thursday, September 13, 12
Theorem and Amazon’s Dynamo Paper • Riak is tuned to o!er availability above all else • Developers can tune for consistency (more on this later) Thursday, September 13, 12
bucket- level setting. Defaults to “3”. • w - number of replicas required for a successful write; Defaults to “2”. • r - number of replica acks required for a successful read. request-level setting. Defaults to “2”. • Tweak consistency vs. availability Thursday, September 13, 12
C, Squeak, Smalltalk, Pharoah, Clojure, Scala, Haskell, Lisp, Go, .NET, Play, and more (supported by either Basho or the community). Thursday, September 13, 12
number of evenly-sized partitions • partitions are claimed by nodes in the cluster 32 partitions node 0 node 1 node 2 node 3 0 2160/2 2160/4 Thursday, September 13, 12
number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3 Thursday, September 13, 12
number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3 hash(“meetups/nycdevops”) N=3 Thursday, September 13, 12
of Vnodes • Unit of addressing, concurrency in Riak • Storage not tied to physical assets • Enables dynamic rebalancing of data when cluster topology changes Thursday, September 13, 12
at the object level • Provides happened-before relationship between events • Each object in Riak has a vector clock* • Trade o! space, speed, complexity for safety Thursday, September 13, 12
be rebalanced • Hando! and rebalancing happen in the background; no manual intervention required* • Trade o! speed of convergence vs. e!ects on cluster performance Thursday, September 13, 12
Bitcask, LevelDB are used the most in production depending on use-case • All writes are appends to a "le • This provide crash safety and fast writes • Tradeo! - periodic, background compaction is required Thursday, September 13, 12
to require >1 physical machine (preferably >4) When availability is more important than consistency (think “critical data”on “big data”) When your data can be modeled as keys and values; don’t be afraid to denormalize Thursday, September 13, 12
hit to a Mochi web property results in at least one read, maybe write to Riak • Unavailability or high latency = lost ad revenue Thursday, September 13, 12
o! traditional features to better support new and emerging use cases • Knowledge of the underlying system is essential • A lot of NoSQL Marketing is still bullshit Choosing a NoSQL Database Thursday, September 13, 12