Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DIBI workshop Replication

Avatar for rozza rozza
October 07, 2013

DIBI workshop Replication

Avatar for rozza

rozza

October 07, 2013
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Agenda •  Replica Sets Lifecycle •  Developing with Replica Sets

    •  Operational Considerations •  How Replication works
  2. Why Replication? •  How many have faced node failures? • 

    How many have been woken up from sleep to do a fail-over(s)? •  How many have experienced issues due to network latency? •  Different uses for data –  Normal processing –  Simple analytics
  3. Node 1 Secondary Node 2 Secondary Node 3 Primary Replication

    Heartbeat Replication Replica Set – Initialize
  4. Node 1 Secondary Node 2 Primary Replication Heartbeat Node 3

    Recovery Replication Replica Set – Recovery
  5. Node 1 Secondary Node 2 Primary Replication Heartbeat Node 3

    Secondary Replication Replica Set – Recovered
  6. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options
  7. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options Primary DC
  8. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options Secondary DC Default Priority = 1
  9. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options Analytics node
  10. Backup node > conf = { _id : "mySet", members

    : [ {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options
  11. Write Concern •  Network acknowledgement •  Wait for error • 

    Wait for journal sync •  Wait for replication
  12. Tagging •  New in 2.0.0 •  Control where data is

    written to, and read from •  Each member can have one or more tags –  tags: {dc: "ny"} –  tags: {dc: "ny", subnet: "192.168", rack: "row3rk7"} •  Replica set defines rules for write concerns •  Rules can change without changing app code
  13. { _id : "mySet", members : [ {_id : 0,

    host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}}], settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}} } } > db.blogs.insert({...}) > db.runCommand({getLastError : 1, w : "someDCs"}) Tagging Example
  14. Driver Primary (SF) Secondary (NY) getLastError write W:allDCs Secondary (Cloud)

    replicate replicate apply in memory Wait for Replication (Tagging)
  15. Read Preference Modes •  5 modes (new in 2.2) – 

    primary (only) - Default –  primaryPreferred –  secondary –  secondaryPreferred –  Nearest When more than one node is possible, closest node is used for reads (all modes but primary)
  16. Tagged Read Preference •  Custom read preferences •  Control where

    you read from by (node) tags –  E.g. { "disk": "ssd", "use": "reporting" } •  Use in conjunction with standard read preferences –  Except primary
  17. •  Single datacenter •  Single switch & power •  Points

    of failure: –  Power –  Network –  Data center –  Two node failure •  Automatic recovery of single node crash Replica Set – 1 Data Center Datacenter 2 Datacenter Member 1 Member 2 Member 3
  18. •  Multi data center •  DR node for safety • 

    Can’t do multi data center durable write safely since only 1 node in distant DC Replica Set – 2 Data Centers Member 3 Datacenter 2 Member 1 Member 2 Datacenter 1
  19. •  Three data centers •  Can survive full data center

    loss •  Can do w= { dc : 2 } to guarantee write in 2 data centers (with tags) Replica Set – 3 Data Centers Datacenter 1 Member 1 Member 2 Datacenter 2 Member 3 Member 4 Datacenter 3 Member 5
  20. Implementation details •  Heartbeat every 2 seconds –  Times out

    in 10 seconds •  Local DB (not replicated) –  system.replset –  oplog.rs •  Capped collection •  Idempotent version of operation stored
  21. > db.replsettest.insert({_id:1,value:1}) { "ts" : Timestamp(1350539727000, 1), "h" : NumberLong("6375186941486301201"),

    "op" : "i", "ns" : "test.replsettest", "o" : { "_id" : 1, "value" : 1 } } > db.replsettest.update({_id:1},{$inc:{value:10}}) { "ts" : Timestamp(1350539786000, 1), "h" : NumberLong("5484673652472424968"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "value" : 11 } } } Op(erations) Log is idempotent
  22. > db.replsettest.update({},{$set:{name : "foo"}, false, true}) { "ts" : Timestamp(1350540395000,

    1), "h" : NumberLong("-4727576249368135876"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 2 }, "o" : { "$set" : { "name" : "foo" } } } { "ts" : Timestamp(1350540395000, 2), "h" : NumberLong("-7292949613259260138"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 3 }, "o" : { "$set" : { "name" : "foo" } } } { "ts" : Timestamp(1350540395000, 3), "h" : NumberLong("-1888768148831990635"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "name" : "foo" } } } Single operation can have many entries
  23. Replica sets •  Use replica sets •  Easy to setup

    –  Try on a single machine •  Check doc page for RS tutorials –  http://docs.mongodb.org/manual/replication/#tutorials
  24. Agenda •  Why shard •  MongoDB's approach •  Architecture • 

    Configuration •  Mechanics •  Solutions
  25. Partition data based on ranges •  User defines shard key

    •  Shard key defines range of data •  Key space is like points on a line •  Range is a segment of that line -∞ +∞ Key Space
  26. Distribute data in chunks across nodes •  Initially 1 chunk

    •  Default max chunk size: 64mb •  MongoDB automatically splits & migrates chunks when max reached Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
  27. MongoDB manages data •  Queries routed to specific shards • 

    MongoDB balances cluster •  MongoDB migrates data to new nodes Shard Shard Shard Mongos 1 2 3 4
  28. MongoDB Auto-Sharding •  Minimal effort required –  Same interface as

    single mongod •  Two steps –  Enable Sharding for a database –  Shard collection within database
  29. Data stored in shard •  Shard is a node of

    the cluster •  Shard can be a single mongod or a replica set Shard Primary Secondary Secondary Shard or Mongod
  30. Config server stores meta data •  Config Server –  Stores

    cluster chunk ranges and locations –  Can have only 1 or 3 (production must have 3) –  Two phase commit (not a replica set) or Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server
  31. MongoS manages the data •  Mongos –  Acts as a

    router / balancer –  No local data (persists to config database) –  Can have 1 or many App Server Mongos Mongos App Server App Server App Server Mongos or
  32. Node 1 Secondary Config Server Node 1 Secondary Config Server

    Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server Sharding infrastructure
  33. Example cluster setup •  Don’t use this setup in production!

    -  Only one Config server (No Fault Tolerance) -  Shard not in a replica set (Low Availability) -  Only one Mongos and shard (No Performance Improvement) -  Useful for development or demonstrating configuration mechanics Node 1 Secondary Config Server Mongos Mongod Mongod
  34. Node 1 Secondary Config Server Start the config server • 

    "mongod --configsvr" •  Starts a config server on the default port (27019)
  35. Node 1 Secondary Config Server Mongos Start the mongos router

    •  "mongos --configdb <hostname>:27019" •  For 3 config servers: "mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>" •  This is always how to start a new mongos, even if the cluster is already running
  36. Start the shard database •  "mongod --shardsvr" •  Starts a

    mongod with the default shard port (27018) •  Shard is not yet connected to the rest of the cluster •  Shard may have already been running in production Node 1 Secondary Config Server Mongos Mongod Shard
  37. Add the shard •  On mongos: "sh.addShard(‘<host>:27018’)" •  Adding a

    replica set: "sh.addShard(‘<rsname>/<seedlist>’) •  In 2.2 and later can use sh.addShard(‘<host>:<port>’) Node 1 Secondary Config Server Mongos Mongod Shard
  38. Verify that the shard was added •  db.runCommand({ listshards:1 })

    •  { "shards" : [ { "_id": "shard0000", "host": "<hostname>:27018" } ], "ok" : 1 } Node 1 Secondary Config Server Mongos Mongod Shard
  39. Enabling Sharding •  Enable sharding on a database –  sh.enableSharding("<dbname>")

    •  Shard a collection with the given key –  sh.shardCollection("<dbname>.people",{"country":1}) •  Use a compound shard key to prevent duplicates –  sh.shardCollection("<dbname>.cars",{"year":1, "uniqueid":1})
  40. Tag Aware Sharding •  Tag aware sharding allows you to

    control the distribution of your data •  Tag a range of shard keys –  sh.addTagRange(<collection>,<min>,<max>,<tag>) •  Tag a shard –  sh.addShardTag(<shard>,<tag>)
  41. minKey maxKey minKey maxKey minKey maxKey {x: -20} {x: 13}

    {x: 25} {x: 100,000} 64MB Chunk is a section of the entire range
  42. Chunk splitting •  A chunk is split once it exceeds

    the maximum size •  There is no split point if all documents have the same shard key •  Chunk split is a logical operation (no data is moved) •  If split creates too large of a discrepancy of chunk count across cluster a balancing round starts minKey maxKey minKey 13 14 maxKey
  43. Balancing •  Balancer is running on mongos •  Once the

    difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
  44. Acquiring the Balancer Lock •  The balancer on mongos takes

    out a "balancer lock" •  To see the status of these locks: -  use config -  db.locks.find({ _id: "balancer" }) Node 1 Secondary Config Server Mongos Shard 1 Mongos Mongos Shard 2 Mongod
  45. Moving the chunk •  The mongos sends a "moveChunk" command

    to source shard •  The source shard then notifies destination shard •  The destination claims the chunk shard-key range •  Destination shard starts pulling documents from source shard Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  46. Committing Migration •  When complete, destination shard updates config server

    -  Provides new locations of the chunks Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  47. Cleanup •  Source shard deletes moved data -  Must wait

    for open cursors to either close or time out -  NoTimeout cursors may prevent the release of the lock •  Mongos releases the balancer lock after old chunks are deleted Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  48. Shard Shard Shard Mongos 1 2 2 2 3 3

    3 Shards return results to mongos
  49. Shard Shard Shard Mongos 1 2 2 2 3 3

    3 4 Mongos returns results to client
  50. Shard Shard Shard Mongos 1 2 2 2 3 3

    3 Query and sort performed locally
  51. Shard Shard Shard Mongos 1 2 2 2 4 4

    4 3 3 3 Shards return results to mongos
  52. Shard Shard Shard Mongos 1 2 2 2 4 4

    4 3 3 3 5 Mongos merges sorted results
  53. Shard Shard Shard Mongos 1 2 2 2 4 4

    4 3 3 3 6 5 Mongos returns results to client
  54. Shard Key •  Choose a field common used in queries

    •  Shard key is immutable •  Shard key values are immutable •  Shard key requires index on fields contained in key •  Uniqueness of `_id` field is only guaranteed within individual shard •  Shard key limited to 512 bytes in size