Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rapid and Scalable Development with MongoDB, Py...

rick446
January 10, 2013

Rapid and Scalable Development with MongoDB, Python, and Ming

This intermediate-level talk will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

rick446

January 10, 2013
Tweet

More Decks by rick446

Other Decks in Technology

Transcript

  1. Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

    Copeland @rick446 Thursday, January 10, 13
  2. Roadmap • Brief overview of MongoDB • Getting started with

    PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13
  3. MongoDB Terminology • MongoDB databases contain collections • MongoDB collections

    contain documents Relational MongoDB Database Database Table Collection Index Index Row Document Column Field Thursday, January 10, 13
  4. JSON and BSON • JSON: Javascript Object Notation • BSON:

    Binary JSON • Extra types • ObjectId, datetime, UUID, Binary, etc. • Restrictions on keys • stay away from “.” and “$” Thursday, January 10, 13
  5. BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”:

    ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Thursday, January 10, 13
  6. BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”:

    ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Thursday, January 10, 13
  7. BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”:

    ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Datetime stored as 64-bit signed # of ms Thursday, January 10, 13
  8. BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”:

    ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Datetime stored as 64-bit signed # of ms Compound sub- document Thursday, January 10, 13
  9. BSON Example { “_id”: ObjectId(...), “title”: “MongoDB for Developers”, “date”:

    ISODateTime(“2012-09-17...”), “instructor”: { “first”: “Rick”, “last”: “Copeland” }, “topics”: [ “MongoDB”, “Python” ] } Document “primary key” Datetime stored as 64-bit signed # of ms Compound sub- document Arrays for embedding “1:N” relations Thursday, January 10, 13
  10. MongoDB Queries • BSON-based query language • Query by example

    • db.foo.find({‘name’: ‘Rick’}) • Various query operators • db.foo.find({‘rating’: { ‘$gt’: 4 } }) • Query “into” arrays/subdocuments • db.foo.find({‘comments.author’: ‘Rick’}) Thursday, January 10, 13
  11. MongoDB Updates • db.update({spec}, {update}) • Default is replacement •

    db.foo.update({‘_id’: ...}, { k0:v0, k1:v1...}) • Can also do partial update with operators • db.posts.update({‘_id’: ObjectId(...)}, {‘$push’: { ‘comments’: ‘This is cool’ } }) Thursday, January 10, 13
  12. MongoDB Indexing • At most one index is used for

    any given query/update • Most indexes are B-tree based • GeoSpatial indexes and queries • Brand-new experimental full-text search (http://blog.serverdensity.com/full-text- search-in-mongodb/) Thursday, January 10, 13
  13. Scaling MongoDB Shard 1 0..10 Shard 2 10..20 Shard 3

    20..30 Shard 4 30..40 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS Configuration Config 1 Config 2 Config 3 MongoS Thursday, January 10, 13
  14. Roadmap • Brief overview of MongoDB • Getting started with

    PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13
  15. PyMongo: Connections and Databases >>> import pymongo >>> cli =

    pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Thursday, January 10, 13
  16. PyMongo: Connections and Databases >>> import pymongo >>> cli =

    pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Thursday, January 10, 13
  17. PyMongo: Connections and Databases >>> import pymongo >>> cli =

    pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Thursday, January 10, 13
  18. PyMongo: Connections and Databases >>> import pymongo >>> cli =

    pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Get a collection Thursday, January 10, 13
  19. PyMongo: Connections and Databases >>> import pymongo >>> cli =

    pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Get a collection Using invalid Python names Thursday, January 10, 13
  20. PyMongo: Connections and Databases >>> import pymongo >>> cli =

    pymongo.MongoClient() >>> cli MongoClient('localhost', 27017) >>> cli.test Database(MongoClient('localhost', 27017), u'test') >>> cli.test.foo Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo') >>> cli['test-db'] Database(MongoClient('localhost', 27017), u'test-db') >>> cli['test-db']['foo-collection'] Collection(Database(MongoClient('localhost', 27017), u'test-db'), u'foo-collection') >>> cli.test.foo.bar.baz Collection(Database(MongoClient('localhost', 27017), u'test'), u'foo.bar.baz') Get a connection Get a database Get a collection Using invalid Python names Collections with ‘.’ embedded Thursday, January 10, 13
  21. PyMongo: Insert/ Update/Delete >>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2,

    {'k':5} ] }) >>> id ObjectId('...') >>> db.foo.find() <pymongo.cursor.Cursor object at 0x1010d5e10> >>> list(db.foo.find()) [{u'bar': 1, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Thursday, January 10, 13
  22. PyMongo: Insert/ Update/Delete >>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2,

    {'k':5} ] }) >>> id ObjectId('...') >>> db.foo.find() <pymongo.cursor.Cursor object at 0x1010d5e10> >>> list(db.foo.find()) [{u'bar': 1, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Auto-generated _id Thursday, January 10, 13
  23. PyMongo: Insert/ Update/Delete >>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2,

    {'k':5} ] }) >>> id ObjectId('...') >>> db.foo.find() <pymongo.cursor.Cursor object at 0x1010d5e10> >>> list(db.foo.find()) [{u'bar': 1, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Auto-generated _id Cursor == Python Generator Thursday, January 10, 13
  24. PyMongo: Insert/ Update/Delete >>> db.foo.update({'_id': id}, {'$set': { 'bar': 2}

    }) {u'updatedExisting': True, u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> db.foo.find_one() {u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]} >>> db.foo.remove({'_id': id}) {u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> list(db.foo.find()) [] Thursday, January 10, 13
  25. PyMongo: Insert/ Update/Delete >>> db.foo.update({'_id': id}, {'$set': { 'bar': 2}

    }) {u'updatedExisting': True, u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> db.foo.find_one() {u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]} >>> db.foo.remove({'_id': id}) {u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> list(db.foo.find()) [] Partial Update Thursday, January 10, 13
  26. PyMongo: Insert/ Update/Delete >>> db.foo.update({'_id': id}, {'$set': { 'bar': 2}

    }) {u'updatedExisting': True, u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> db.foo.find_one() {u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]} >>> db.foo.remove({'_id': id}) {u'connectionId': 24, u'ok': 1.0, u'err': None, u'n': 1} >>> list(db.foo.find()) [] Partial Update Remove: same query language as find() Thursday, January 10, 13
  27. PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'),

    ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Thursday, January 10, 13
  28. PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'),

    ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Thursday, January 10, 13
  29. PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'),

    ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Find all documents Thursday, January 10, 13
  30. PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'),

    ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Find all documents Restrict by range (>=) Thursday, January 10, 13
  31. PyMongo: Queries >>> db.foo.insert([dict(x=i) for i in range(4) ]) [ObjectId('...'),

    ObjectId('...'), ObjectId('...'), ObjectId('...')] >>> list(db.foo.find()) [{u'x': 0, u'_id': ObjectId('...')}, {u'x': 1, u'_id': ObjectId('...')}, {u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}})) [{u'x': 2, u'_id': ObjectId('...')}, {u'x': 3, u'_id': ObjectId('...')}] >>> list(db.foo.find({'x': { '$gte': 2}}, { '_id': 0 } )) [{u'x': 2}, {u'x': 3}] Batching inserts Find all documents Restrict by range (>=) Retrieve partial results Thursday, January 10, 13
  32. PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2,

    u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} Thursday, January 10, 13
  33. PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2,

    u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} No Index: Scan all the documents Thursday, January 10, 13
  34. PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2,

    u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} No Index: Scan all the documents With index: skip to the returned documents Thursday, January 10, 13
  35. PyMongo: Indexes >>> db.foo.find({'x': {'$gte': 2}}).explain() { ..., n: 2,

    u'cursor': u'BasicCursor', ..., u'nscannedObjects': 4, ..., u'nscanned': 4} >>> db.foo.ensure_index('x') u'x_1' >>> db.foo.find({'x': {'$gte': 2}}).explain() {..., u'n':2, u'cursor': u'BtreeCursor x_1', ..., u'nscannedObjects': 2, ..., u'nscanned': 2} >>> db.foo.find({'x': {'$gte': 2}}, ... { 'x':1, '_id': 0}).explain() {..., u'indexOnly': True, ...} No Index: Scan all the documents With index: skip to the returned documents indexOnly: don’t even load the doc Thursday, January 10, 13
  36. And if you really must... >>> list(db.foo.find({'$where': 'this.bar >= 1'}))

    [{u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] • ...but please don’t, if you want performance • Javascript global interpreter lock • BSON/JS translation • Forget about indexes • (for the $where, at least) Thursday, January 10, 13
  37. And if you really must... >>> list(db.foo.find({'$where': 'this.bar >= 1'}))

    [{u'bar': 2, u'_id': ObjectId('...'), u'baz': [1, 2, {u'k': 5}]}] Javascript expr, evaluated in document context • ...but please don’t, if you want performance • Javascript global interpreter lock • BSON/JS translation • Forget about indexes • (for the $where, at least) Thursday, January 10, 13
  38. PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'),

    ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Thursday, January 10, 13
  39. PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'),

    ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Thursday, January 10, 13
  40. PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'),

    ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Sometimes slow Thursday, January 10, 13
  41. PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'),

    ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Sometimes slow Limited result size Thursday, January 10, 13
  42. PyMongo: Aggregation >>> db.foo.insert([dict(x=i, y=i%4) for i in range(10)]) [ObjectId('...'),

    ...] >>> db.foo.count() 10 >>> db.foo.find({'x': {'$lt': 4}}).count() 4 >>> db.foo.find({'x': {'$lt': 4}}).distinct('y') [0, 1, 2, 3] >>> db.foo.group( ... ['y'], {'x': { '$gt': 4}}, ... {'count': 0}, ... 'function(cur, acc) { acc.count += 1; }') [{u'y': 1.0, u'count': 2.0}, {u'y': 2.0, u'count': 1.0}, {u'y': 3.0, u'count': 1.0}, {u'y': 0.0, u'count': 1.0}] Always fast Sometimes slow Limited result size 20k results Uses JS Can’t shard Thursday, January 10, 13
  43. PyMongo: MapReduce { doc } map() (key,{doc}) pairs group by

    key (key,[{docs}]) pairs reduce(key, values) Write back / return finalize(key, value) { value } Thursday, January 10, 13
  44. PyMongo: MapReduce >>> db.cities.find_one( ... {'country_code': 'US'}, ... {'_id': 0,

    'name': 1, 'country_code': 1, 'admin1_code': 1, 'population': 1}) {u'admin1_code': u'VA', u'name': u'Fort Hunt', u'country_code': u'US', u'population': 16045L} >>> mapf = '''function() { ... emit(this.admin1_code, ... { count: 1, pop: this.population } ); ... }''' >>> Thursday, January 10, 13
  45. PyMongo: MapReduce >>> reducef = '''function(key, docs) { ... var

    result = { count: 0, pop: 0 }; ... docs.forEach(function(doc) { ... result.count += doc.count; ... result.pop += doc.pop; ... }); ... return result; ... }''' >>> >>> finalizef = '''function(key, doc) { ... return { ... count: doc.count, ... pop: doc.pop, ... mean_pop: doc.pop / doc.count}; ... }''' Thursday, January 10, 13
  46. PyMongo: MapReduce >>> db.cities.map_reduce( ... map=mapf, ... reduce=reducef, ... out='state_pop',

    ... query={'country_code': 'US'}, ... finalize=finalizef) Collection(Database(MongoClient('localhost', 27017), u'tutorial'), u'state_pop') >>> db.state_pop.find_one() {u'_id': u'AK', u'value': {u'count': 4.0, u'mean_pop': 93529.5, u'pop': 374118.0}} Thursday, January 10, 13
  47. PyMongo: MapReduce • Still uses JS, but can parallelize across

    shards • Can write back results to a collection (suitable for large batch processes) Thursday, January 10, 13
  48. PyMongo: Aggregation Framework • Pipeline of operators • $match •

    $project • $skip, $limit, $sort • $unwind • $group Thursday, January 10, 13
  49. PyMongo: Aggregation Framework Pipeline All Docs in Collection Matched Docs

    Reshaped Docs Unwound Docs Grouped Docs Sorted Docs $match $project $group $sort $unwind Thursday, January 10, 13
  50. PyMongo: Aggregation Framework Example >>> db.cities.aggregate( [ ... { '$match':

    { 'name': 'Atlanta' } }, ... { '$project': { ... 'name': 1, ... 'country_code': 1, ... 'position': { 'lon': '$longitude', ... 'lat': '$latitude' } ... } } ... ] ... ) {u'ok': 1.0, u'result': [{u'position': {u'lat': 33.749, u'lon': -84.38798}, u'_id': 4180439, u'name': u'Atlanta', u'country_code': u'US'}]} Thursday, January 10, 13
  51. PyMongo: Aggregation Framework • No Javascript GIL • Sharding supported

    • Limited results to a single document • “Super-find” Thursday, January 10, 13
  52. PyMongo: GridFS >>> import gridfs >>> fs = gridfs.GridFS(db) >>>

    with fs.new_file() as fp: ... fp.write('The file') ... >>> fp <gridfs.grid_file.GridIn object at 0x2cae910> >>> fp._id ObjectId('...') >>> fs.get(fp._id).read() 'The file' • File-like abstraction for data >16MB • Files open for read or write, not both Thursday, January 10, 13
  53. Roadmap • Brief overview of MongoDB • Getting started with

    PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13
  54. Why Ming? • Your data has a schema (even if

    the DB doesn’t enforce it) • Sometimes you need migrations • “Unit of work” - sometimes it’s nice to queue up your updates Thursday, January 10, 13
  55. Ming: Datastore and Session >>> import ming >>> ds =

    ming.create_datastore('test') >>> ds.db Database(MongoClient('localhost', 27017), u'test') >>> sess = ming.Session(ds) >>> sess.db Database(MongoClient('localhost', 27017), u'test') >>> import ming.config >>> ming.config.configure_from_nested_dict( ... { 'main': { 'uri': 'mongodb://localhost:27017/test' } }) >>> sess = ming.Session.by_name('main') >>> sess.db Database(MongoClient(u'localhost', 27017), u'test') Thursday, January 10, 13
  56. Ming: Define your Schema WikiDoc = collection('wiki_page', session, Field('_id', schema.ObjectId()),

    Field('title', str, index=True), Field('text', str)) CommentDoc = collection('comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str)) Thursday, January 10, 13
  57. Ming: Define your Schema WikiDoc = collection('wiki_page', session, Field('_id', schema.ObjectId()),

    Field('title', str, index=True), Field('text', str)) CommentDoc = collection('comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str)) Index on session configuration Thursday, January 10, 13
  58. Ming: Define your Schema WikiDoc = collection('wiki_page', session, Field('_id', schema.ObjectId()),

    Field('title', str, index=True), Field('text', str)) CommentDoc = collection('comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str)) Index on session configuration Shorthand for schema.String() Thursday, January 10, 13
  59. Ming Schema for the Classically Inclined class WikiDoc(Document): class __mongometa__:

    session=Session.by_name('main') name='wiki_page' indexes=[ ('title') ] title = Field(str) text = Field(str) Thursday, January 10, 13
  60. Using Ming Models >>> from wiki import WikiDoc >>> doc

    = WikiDoc(dict(title='Cats', text='I can haz cheezburger?')) >>> doc.m.save() >>> WikiDoc.m.find() <ming.base.Cursor object at 0x101500b10> >>> WikiDoc.m.find().all() [{'text': u'I can haz cheezburger?', '_id': ObjectId('50eddf6bfb72f03b78a3823c'), 'title': u'Cats'}] >>> WikiDoc.m.find().one().text u'I can haz cheezburger?' >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle')) >>> doc.m.save() Traceback (most recent call last): ... ming.schema.Invalid: Extra keys: set(['tietul']) Thursday, January 10, 13
  61. Using Ming Models >>> from wiki import WikiDoc >>> doc

    = WikiDoc(dict(title='Cats', text='I can haz cheezburger?')) >>> doc.m.save() >>> WikiDoc.m.find() <ming.base.Cursor object at 0x101500b10> >>> WikiDoc.m.find().all() [{'text': u'I can haz cheezburger?', '_id': ObjectId('50eddf6bfb72f03b78a3823c'), 'title': u'Cats'}] >>> WikiDoc.m.find().one().text u'I can haz cheezburger?' >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle')) >>> doc.m.save() Traceback (most recent call last): ... ming.schema.Invalid: Extra keys: set(['tietul']) A Document is a dict subclass Thursday, January 10, 13
  62. Using Ming Models >>> from wiki import WikiDoc >>> doc

    = WikiDoc(dict(title='Cats', text='I can haz cheezburger?')) >>> doc.m.save() >>> WikiDoc.m.find() <ming.base.Cursor object at 0x101500b10> >>> WikiDoc.m.find().all() [{'text': u'I can haz cheezburger?', '_id': ObjectId('50eddf6bfb72f03b78a3823c'), 'title': u'Cats'}] >>> WikiDoc.m.find().one().text u'I can haz cheezburger?' >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle')) >>> doc.m.save() Traceback (most recent call last): ... ming.schema.Invalid: Extra keys: set(['tietul']) A Document is a dict subclass Validate data Thursday, January 10, 13
  63. Ming Bonus: MIM • In-memory partial pymongo implementation • Useful

    for unit tests • Does not scale well (SmallData) >>> ming.create_datastore('mim:///test').db mim.Database(test) Thursday, January 10, 13
  64. Roadmap • Brief overview of MongoDB • Getting started with

    PyMongo • Sprinkle in some Ming schemas • Object-Document Mapping: When a dict just won’t do Thursday, January 10, 13
  65. Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object):

    pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Thursday, January 10, 13
  66. Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object):

    pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Plain Old Python Classes Thursday, January 10, 13
  67. Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object):

    pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Plain Old Python Classes Map class to collection + session Thursday, January 10, 13
  68. Ming ODM: Classes and Collections odmsession = ODMSession(session) class WikiPage(object):

    pass class Comment(object): pass odmsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('Comment'))) odmsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage'))) Plain Old Python Classes Map class to collection + session “Relations” Thursday, January 10, 13
  69. And again, if you like classes... class WikiPage(MappedClass): class __mongometa__:

    session = main_odm_session name='wiki_page' indexes = [ 'title' ] _id = FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty('Comment') Thursday, January 10, 13
  70. Ming ODM: Sessions and Queries • Session ==> ODMSession •

    collection.m.... ==> MappedClass.query... • Session actually does stuff • Track object identity • Track object modifications • Unit of work to save everything at once Thursday, January 10, 13
  71. Ming ODM: Sessions and Queries >>> pg = WikiPage(title='MyPage', text='is

    here') >>> session.db.wiki_page.count() 0 >>> odmsession <session> <UnitOfWork> <new> <WikiPage text='is here'...> <clean> <dirty> <deleted> <imap (1)> WikiPage : ... => <WikiPage ...> >>> odmsession.flush() >>> session.db.wiki_page.count() 1 Thursday, January 10, 13
  72. Integration with Python Web Frameworks • ThreadLocalODMSesssion • ming.odm.middleware.MingMiddleware •

    flush all sessions on success • clear all sessions on exception • when you don’t have real transactions, fake ‘em Thursday, January 10, 13
  73. Wrapping Up • MongoDB: Scalable document store • http://mongodb.org •

    PyMongo: Python API mapping dicts to docs • http://api.mongodb.org/python/current/ • Ming: Schema validation and ODM • http://sf.net/p/merciless Thursday, January 10, 13
  74. Questions? MongoDB Applied Design Patterns Coming out Real Soon Now

    MongoDB with Python and Ming ebook http://arborian.com/book Need MongoDB or Python help? Rick Copeland @rick446 http://arborian.com Thursday, January 10, 13