Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra. Bass. Databass.

αλεx π
April 16, 2013
170

Cassandra. Bass. Databass.

αλεx π

April 16, 2013
Tweet

Transcript

  1. Consistent Better than available? Also, what’s you slave replication lag?

    Also, are you caches, search index, and all denormalizations consistent at all times? Is EVERYTHING you do is happening with user watching it? Y U NO? Tuesday, April 16, 13
  2. JOIN / UNION Yeah, across multi-terabyte cluster. Also, availability? Cascading

    updates? Read performance? Y U NO? Tuesday, April 16, 13
  3. Transactions Sure, just wait a couple of minutes until I

    roll-back previous one Or go buy Oracle. NOW! Y U NO? Tuesday, April 16, 13
  4. • Amount of data is growing • Frequency/rate of incoming

    data is growing • Demand in scalable services is growing • Application backends get larger • Data loss and corruption is unacceptable • Variety of data is growing World expectations Tuesday, April 16, 13
  5. • Building scalable backends is hard, no matter which tech

    you use • Lack of tool knowledge will bite you • Not knowing your numbers makes your choices blind • Eventual consistency is eventual • Conflict resolution strategies matter • Picking a nosql data store will require you to write more backend code World expectations Tuesday, April 16, 13
  6. • Hundreds, thousands of requests per second • Write (and

    read) heavy applications • How to make sense of data • Eliminate single point of failure • Predictable latency Challenges Tuesday, April 16, 13
  7. • Predictable latency • Horizontal scaling • Continuous availability •

    Reducing system complexity • Avoiding data loss (accepting writes) • Storing data in a meaningful way • Prepare data for reading Challenges Tuesday, April 16, 13
  8. • Application data model • Read and write patterns •

    What can be shown immediately, what can wait (think: dashboards vs shopping carts) • How to build backend that’ll make sure it all is ticking • Learn how your clustered client works • Know your potential problems with consistency Things to ponder upon before joining nosql camp Tuesday, April 16, 13
  9. • Will vary depending on: • the size of your

    DB • load on the database • frequency of written data Possibilities Tuesday, April 16, 13
  10. • “ultra-hot” rather small datasets in memory • make it

    possible to reconstruct your data • write increments (changes) • reconstruct final views from history • for the data that doesn’t tolerate inconsistency, make sure to use appropriate DSs • avoid single point of failure at all costs Possibilities Tuesday, April 16, 13
  11. • Session store • Time series, logging • Business intelligence

    backend • Caching/speed layer (denormalized, read-ready) • Round-robin/capped collections Use Cases Tuesday, April 16, 13
  12. • Consistent hashing • Virtual nodes • SSTables • Partitioners

    • Hinted handoff • Node/data balancing • Gossip protocol • Conflict resolution Cassandra, under the hood Tuesday, April 16, 13
  13. • determine how data is distributed across nodes • with

    growth of nodes amount, minimal amount of data is affected • Murmur3, Random (consistent hashing) • Ordered Partitioner • allows ordered scans by partition key • difficult load balancing • hotspots because of sequential writes Under The Hood / Paritioners Tuesday, April 16, 13
  14. • many token ranges per node • each node owns

    small range, which reduces load • does not require token generation and assignment • helps to avoid rebalancing • rebuilding dead node is faster, since data is copied from more nodes • major improvements for heterogenous cluster Under The Hood / VNodes Tuesday, April 16, 13
  15. • crash recovery log for data • by default, mode

    is `periodic` • move commitlog to differnt drive (to reduce contention with SSTable) • configure commitlog size • smaller will improve replay time • flush infrequently-updated CFs to disk • (slight) chance of lost written data, if all replicas go down during sync timeout period • check `batch` instead, to sync before ack'ing write Under The Hood / CommitLog Tuesday, April 16, 13
  16. • in-memory cache with key/column structure • per ColumnFamily •

    sorted by key and written sequentially • flushed to SSTable when full, in background Under The Hood / Memtable Tuesday, April 16, 13
  17. • Sorted String Table • Immutable • has BloomFilter, for

    read optimization • reads (potentially) require to combine row fragments from multiple SSTables and unflushed memtables Under The Hood / SSTable Tuesday, April 16, 13
  18. • has index (data location by key) • holds data

    itself • multiple per CF • merged after threshold is reached Under The Hood / SSTable Tuesday, April 16, 13
  19. • merges SSTables • changes for same key on different

    SSTables • discards tombstones, reclaims space • refreshes index of SSTable (changed addresses) Under The Hood / SSTable / Compaction Tuesday, April 16, 13
  20. • by default, sloppy quorum is off • toggled by

    coordinator when target node is unavailable • hint is replicated back when node is back Under The Hood / Hinted Handoff Tuesday, April 16, 13
  21. • client connects to coordinator node • technically, any node

    you connect to may be a coordinator • coordinator determines nodes responsible for the key • responsible nodes are sorted by proximity Under The Hood / Read Path / On Coordinator Tuesday, April 16, 13
  22. • node checks memtable for data • if not found,

    data is read from SSTables Under The Hood / Read Path / On the node Tuesday, April 16, 13
  23. • using bloom filters • false positives are possible, but

    not false negatives • determine if SSTable contains the key • avoiding additional disk seeks • uses index to locate data fast • data is returned to client Under The Hood / Read Path / On the node / BOOM filters Tuesday, April 16, 13
  24. • pretty much same as read pat • data is

    first written to CommitLog • then, data is written to Memtable • if there's not enough nodes to receive a write • coordinator takes a hinted write Under The Hood / Write Path Tuesday, April 16, 13
  25. • Riak guys like to mention lack of vector clocks

    • Yeah, it's timestamp-based • Grab yourself some ntp • or a TAAS (timestamp as a service) • but keep in mind that... • writes are column-based, not row-based Under The Hood / Conflict Resolution Tuesday, April 16, 13
  26. CREATE KEYSPACE new_cql_keyspace ... CREATE TABLE posts (content text, entry_title

    text, userid text, PRIMARY KEY (userid)); INSERT INTO posts (userid, entry_title, content) VALUES ('user1', 'title1', 'content1') USING TIMESTAMP 1365885294994; Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13
  27. userid | content | entry_title --------+----------+------------- user1 | content1 |

    title1 Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13
  28. UPDATE posts USING TIMESTAMP 1365885294995 SET entry_title='new title' WHERE userid='user1';

    Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13
  29. get posts[user1]; => (column=content, value=636f6e74656e7431, timestamp=1365885294994) ;; OLD => (column=entry_title,

    value=636f6e74656e7431, timestamp=1365885294995) ;; NEW Under The Hood / Conflict Resolution / Column-based writes Tuesday, April 16, 13
  30. • There are actual data types, yes! • ASCII, String

    • Boolean • all kinds of numbers • Binary type • Date type • Counters My favorite C* features / Data Types Tuesday, April 16, 13
  31. • CompositeType (Int32Type, DateType) • Sets #(1, 2, 3) •

    Lists [1, 2 ,3] • Maps {:a “value1” :b “value2”} • Downside: no indexes on collections My favorite C* features / Data Types Tuesday, April 16, 13
  32. • Compound primary keys • consists of more than one

    column • allows wide range of queries • Compound primary keys • key itself consists of 2 values • Clustering order • allows on-disk sorting of columns My favorite C* features / Keys Tuesday, April 16, 13
  33. • Partition key and Clustering columns • first key is

    partition key • determines node placement • rest is clustering columns • insert,update,delete ops for same partition key are atomic and isolated • values that share partition key are stored on same node(s) My favorite C* features / Keys Tuesday, April 16, 13
  34. • Work best when you index data properly • Range

    queries • “IN” queries My favorite C* features / Querying Tuesday, April 16, 13
  35. • Not quite built-in • But Cassandra provides nice Hadoop

    integration • We use Cascading/Cascalog & Hadoop My favorite C* features / MapReduce Tuesday, April 16, 13
  36. • Useful for caches • Specify how long your data

    lives My favorite C* features / TTL Tuesday, April 16, 13
  37. LIMIT remember those “How do I paginate in Cassandra” StackOverflow

    questions in early days? Tuesday, April 16, 13
  38. • Saves network round-trips • Updates in BATCH on partition

    key are performed atomically and in isolation • Supports only Update, Insert and Delete My favorite C* features / Batch Operations Tuesday, April 16, 13
  39. • Parse once, execute many times • Major speed-up •

    Very simple to work with / serialize complex DSs and binary data My favorite C* features / Prepared Statements Tuesday, April 16, 13
  40. • Reduces overhead of communication between teams • Easy to

    see what’s the structure table • Easy to see what / how to query My favorite C* features / Prepared Statements Tuesday, April 16, 13
  41. • Beware of • Schema changes • Network failures •

    Storage (or hardware/software) upgrades • Know how to handle them with your store My favorite C* features / Conclusions Tuesday, April 16, 13
  42. • Good drivers (FINALLY), after years of Hector/ Thrift •

    Binary CQL protocol, new possibilities • Internals became way more powerful with 1.2 • Ease of use/development • Still some work to do (Hadoop integration is still on Thrift) My favorite C* features / Conclusions Tuesday, April 16, 13
  43. • Awesome database • Predictable • Know your data before

    using anything • Know your tool before using it • Make conscious choices My favorite C* features / Conclusions Tuesday, April 16, 13
  44. • Forget about normalized data • Plan against data usage

    • Have a powerful backend to conform to the store • Determine best caching strategy • Use windowed operations (EEP) for pre- aggregation, instead of believing realtime ad-hoc myth My favorite C* features / Conclusions Tuesday, April 16, 13