@elubow Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
@elubow Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements
@elubow • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (partial CF scans) Cassandra
@elubow • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents mongoDB
@elubow • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sorted set • Pay SerDe price on each access Redis
@elubow • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Extents/Knowledge Grid) InfiniDB and Infobright
@elubow • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance Ruby, Node.js, Python
@elubow Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads
@elubow Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines
@elubow Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • More about Node.js and Cassandra
@elubow Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them