A 0x0000000000..1 0x4000000000..0 B 0x4000000000..1 0x8000000000..0 C 0x8000000000..1 0xc000000000..0 D 0xc000000000..1 0x0000000000..0 Friday, May 16, 14
A 0x0000000000..1 0x4000000000..0 B 0x4000000000..1 0x8000000000..0 C 0x8000000000..1 0xc000000000..0 D 0xc000000000..1 0x0000000000..0 Friday, May 16, 14
A 0x0000000000..1 0x4000000000..0 B 0x4000000000..1 0x8000000000..0 C 0x8000000000..1 0xc000000000..0 D 0xc000000000..1 0x0000000000..0 Friday, May 16, 14
A 0x0000000000..1 0x4000000000..0 B 0x4000000000..1 0x8000000000..0 C 0x8000000000..1 0xc000000000..0 D 0xc000000000..1 0x0000000000..0 Friday, May 16, 14
A 0x0000000000..1 0x4000000000..0 B 0x4000000000..1 0x8000000000..0 C 0x8000000000..1 0xc000000000..0 D 0xc000000000..1 0x0000000000..0 Friday, May 16, 14
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition Friday, May 16, 14
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition Use roughly one table per query pattern. Friday, May 16, 14
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition Friday, May 16, 14
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition Friday, May 16, 14
KEY, email text, age int ) CREATE TABLE users_by_email ( email text PRIMARY KEY, username text, age int ) We have users and want to look them up: • by username • by email Friday, May 16, 14
KEY, id uuid ) CREATE TABLE users_by_email ( email text PRIMARY KEY, id uuid ) What if we had done: CREATE TABLE users ( id uuid PRIMARY KEY username text, email text, age int, ) Friday, May 16, 14
users ( id uuid PRIMARY KEY username text, email text, age int, ) We reduced data duplication! Need to read twice as many partitions. Friday, May 16, 14 Same problem with secondary indexes Consider cost/frequency of updates vs reads. Worse for other patterns.
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition Friday, May 16, 14
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition Friday, May 16, 14
Determine what specific queries to support 2. Try to create a table where you can satisfy your query by reading (roughly) one partition How do we fit a group in a partition? Friday, May 16, 14
order rows by username. CREATE TABLE groups ( groupname text, username text, email text, age int, PRIMARY KEY (groupname, username) ) Friday, May 16, 14
but reads are great. CREATE TABLE groups ( groupname text, username text, email text, age int, PRIMARY KEY (groupname, username) ) Friday, May 16, 14 How to break up big groups? Split by time, username prefix, or prefix of a hash of the username.
users ( id uuid PRIMARY KEY username text, email text, age int ) CREATE TABLE groups ( groupname text, user_id uuid, PRIMARY KEY (groupname, user_id) ) Read wayyyy too many partitions. Friday, May 16, 14
timeuuid, username text, email text, age int, PRIMARY KEY (groupname, joined) ) Use timeuuid instead of timestamp to avoid collisions. Friday, May 16, 14
timeuuid, username text, email text, age int, PRIMARY KEY (groupname, joined) ) WITH CLUSTERING ORDER BY (joined DESC) Reverse clustering order to optimize for N latest. Friday, May 16, 14
timeuuid, username text, email text, age int, PRIMARY KEY (groupname, joined) ) WITH CLUSTERING ORDER BY (joined DESC) SELECT * FROM group_join_dates WHERE groupname=? LIMIT ? Friday, May 16, 14
text, joined timeuuid, username text, email text, age int, PRIMARY KEY ((groupname, join_day), joined) ) WITH CLUSTERING ORDER BY (joined DESC) Use a compound partition key. Friday, May 16, 14
text, joined timeuuid, username text, email text, age int, PRIMARY KEY ((groupname, join_day), joined) ) WITH CLUSTERING ORDER BY (joined DESC) Use a compound partition key. join_day looks like ‘2014-05-15’ Friday, May 16, 14 Can also use second, month, year, etc.
reading from single partition. • List, Set, and Map collections CREATE TABLE songs ( artist text, album text, song text, tags set<text>, PRIMARY KEY ((artist, album, song)) ) Friday, May 16, 14 Alternative model would be song <-> tags, requires more reads.
songs ( artist text, album text, song text, PRIMARY KEY ((artist, album, song)) ) CREATE TABLE tags ( artist text, album text, song text, tag text, PRIMARY KEY ((artist, album, song), tag) ) Friday, May 16, 14 Requires more partition reads
reading from single partition. • List, Set, and Map collections • User defined types CREATE TYPE address ( street text, city text, zip int ) CREATE TABLE users ( username text PRIMARY KEY, email text, addresses map<text, address> ) Friday, May 16, 14 Support “residential”, “business”, and other addresses. Alternative is separate table for user addresses
reading from single partition. • List, Set, and Map collections • User defined types • Static Columns CREATE TABLE post_and_comments ( post_id text, post_body text static, comment_time timeuuid, commenter text, comment_body text, PRIMARY KEY (post_id, comment_time) ) Friday, May 16, 14 Always get post body with comments