Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hash-Based Sharding in MongoDB 2.4 (MongoDB SF ...

Hash-Based Sharding in MongoDB 2.4 (MongoDB SF 2013)

Presentation on the hash-based sharding features of MongoDB given at MongoDB SF 2013.

Avatar for Brandon Black

Brandon Black

May 10, 2013
Tweet

More Decks by Brandon Black

Other Decks in Programming

Transcript

  1. Agenda •  Mechanics of Sharding –  Key space –  Chunks

    –  Balancing •  Request Routing •  Hashed Shard Keys –  Why use hashed shard keys –  How to enable hashed shard keys –  Limitations
  2. Sharded Cluster Node 1 Secondary Config Server Node 1 Secondary

    Config Server Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server
  3. What Is A Shard Key? •  Shard key is used

    to partition your collection •  Shard key must exist in every document •  Shard key is immutable •  Shard key values are immutable •  Shard key must be indexed •  Shard key is used to route requests to shards
  4. The Key Space {x: 10}! -∞ +∞ Key Space {x:

    -5}! {x: -9}!{x: 7}!{x: 6}!{x: 0}!
  5. -∞ +∞ Key Space Inserting Data {x: 0}! {x: 6}!

    {x: 7}! {x: -5}! {x: 10}! {x: -9}!
  6. -∞ +∞ Key Space Inserting Data {x: 0}! {x: 6}!

    {x: 7}! {x: -5}! {x: 10}! {x: -9}!
  7. Chunk Range and Size minKey maxKey minKey maxKey minKey maxKey

    {x: -20} {x: 13} {x: 25} {x: 100,000} 64MB -∞ +∞ Key Space {x: 0}! {x: 6}! {x: 7}! {x: -5}! {x: 10}! {x: -9}!
  8. Inserting Further Data -∞ +∞ Key Space {x: 0}! {x:

    6}! {x: 7}! {x: -5}! {x: 10}! {x: -9}! {x: 9}! {x: -7}! {x: 3}!
  9. Chunk Splitting minKey maxKey minKey 13 14 maxKey -∞ +∞

    Key Space {x: 0}! {x: 6}! {x: 7}! {x: -5}! {x: 10}! {x: -9}! 0 0 •  A chunk is split once it exceeds the maximum size •  There is no split point if all documents have the same shard key •  Chunk split is a logical operation (no data is moved) •  If split creates too large of a discrepancy of chunk count across cluster a balancing round starts
  10. Data Distribution Node 1 Secondary Config Server Shard 1 Mongos

    Mongos Mongos Shard 2 Mongod •  MinKey to 0 lives on Shard1 •  0 to MaxKey lives on Shard2 •  Mongos routes queries appropriately
  11. Mongos Routes Data Node 1 Secondary Config Server Shard 1

    Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey db.test.insert({ x: -1000 })!
  12. Mongos Routes Data Node 1 Secondary Config Server Shard 1

    Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey db.test.insert({ x: -1000 })!
  13. Unbalanced Shards Node 1 Secondary Config Server Shard 1 Mongos

    Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey
  14. Balancing Node 1 Secondary Config Server Shard 1 Mongos Mongos

    Mongos Shard 2 Mongod •  Migration threshold •  Number of chunks less than 20, migration threshold of 2 •  21-80, migration threshold 4 •  >80, migration threshold 8
  15. Moving the chunk Node 1 Secondary Config Server Mongos Shard

    1 Shard 2 Mongod •  One chunk of data is copied from Shard 1 to Shard 2
  16. Committing Migration •  Once everyone agrees the data has moved,

    that chunk gets deleted from Shard 1. Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  17. Cleanup •  Other mongos' have to find out about new

    configuration Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  18. Effects of Migrations •  Expensive •  Can take a long

    time •  Competes for limited resources Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  19. Picking A Shard Key •  Cardinality •  Optimize routing • 

    Minimize (unnecessary) traffic •  Allow best scaling
  20. What About ObjectId? ObjectId("51597ca8e28587b86528edfd”)! •  Used for _id •  12

    byte value •  Generated by the driver if not specified •  Theoretically globally unique
  21. // enabling sharding on test database mongos> sh.enableSharding("test") { "ok"

    : 1 } // sharding the test collection mongos> sh.shardCollection("test.test",{_id:1}) { "collectionsharded" : "test.test", "ok" : 1 } // create a loop inserting data mongos> for (x=0; x<10000; x++) { ... db.test.insert({value:x}) ... } Sharding on ObjectId
  22. shards: { "_id" : "shard0000", "host" : "localhost:30000" } {

    "_id" : "shard0001", "host" : "localhost:30001" } databases: { "_id" : "test", "partitioned" : true, "primary" : "shard0001" } test.test shard key: { "_id" : 1 } chunks: shard0001 3 { "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") } on : shard0001 { "t" : 1000, "i" : 1 } { "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 1000, "i" : 2 } ObjectId Chunk Distribution
  23. ObjectId Results In A “Hot Shard” Node 1 Secondary Config

    Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey
  24. Hashed Shard Key Eliminates “Hot Shard” Node 1 Secondary Config

    Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey
  25. Under the Hood •  Create a hashed index used for

    sharding •  Uses the first 64-bits of md5 hash of field •  Hash both data and BSON type •  Represented as a NumberLong in the shell
  26. // hash on 1 as an integer > db.runCommand({_hashBSONElement:1}) {

    "key" : 1, "seed" : 0, "out" : NumberLong("5902408780260971510"), "ok" : 1 } // hash on “1” as a string > db.runCommand({_hashBSONElement:"1"}) { "key" : "1", "seed" : 0, "out" : NumberLong("-2448670538483119681"), "ok" : 1 } Hash on both data and BSON type
  27. // enabling sharding on test database mongos> sh.enableSharding("test") { "ok"

    : 1 } // shard by hashed _id field mongos> sh.shardCollection("test.hash”,{_id:"hashed"}) { "collectionsharded" : "test.hash", "ok" : 1 } Sharding on Hashed ObjectId
  28. databases: { "_id" : "test", "partitioned" : true, "primary" :

    "shard0001" } test.hash shard key: { "_id" : "hashed" } chunks: shard0000 2 shard0001 2 { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 2 } { "_id" : NumberLong("-4611686018427387902") } --> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 3 } { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 { "t" : 2000, "i" : 4 } { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 2000, "i" : 5 } Pre-Splitting the Data
  29. // create a loop inserting data mongos> for (x=0; x<10000;

    x++) { ... db.hash.insert({value:x}) ... } Inserting Into Hashed Shard Key Collection
  30. test.hash shard key: { "_id" : "hashed" } chunks: shard0000

    4 shard0001 4 {"_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8} {"_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9} {"_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("-2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6} {"_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 7} { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743") } on : shard0001 { "t" : 2000, "i" : 12} Even Distribution of Chunks
  31. Hash Keys Are Great for Equality Queries •  Equality queries

    directed to a specific shard •  Will use the index •  Most efficient query possible
  32. mongos> db.hash.find({x:1}).explain() { "cursor" : "BtreeCursor x_hashed", "n" : 1,

    "nscanned" : 1, "nscannedObjects" : 1, "millisShardTotal" : 0, "numQueries" : 1, "numShards" : 1, "indexBounds" : { "x" : [ [ NumberLong("5902408780260971510"), NumberLong("5902408780260971510") ] ] }, "millis" : 0 } Explain Plan of an Equality Query
  33. Not So Good for a Range Query •  Range queries

    scatter gather •  Don’t use the index •  Inefficient query
  34. mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain() { "cursor" : "BasicCursor", "n" : 97,

    "nChunkSkips" : 0, "nYields" : 0, "nscanned" : 1000, "nscannedAllPlans" : 1000, "nscannedObjects" : 1000, "nscannedObjectsAllPlans" : 1000, "millisShardTotal" : 0, "millisShardAvg" : 0, "numQueries" : 2, "numShards" : 2, "millis" : 3 } Explain Plan of a Range Query
  35. Limitations •  Cannot use a compound key •  Key cannot

    have an array value •  Incompatible with tag aware sharding –  Tags would be assigned the value of the hash, not the value of the underlying key •  Key with poor cardinality is going to give a hash with poor cardinality –  Floating point numbers are squashed. E.g. 100.4 will be hashed as 100
  36. Summary •  There are 3 different approaches for sharding • 

    Hash shard keys give great distribution •  Hash shard keys are good for equality •  Pick the right shard key for your application