Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rapid and Scalable Development with MongoDB, Py...

rick446
April 20, 2012

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

This intermediate-level talk will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

rick446

April 20, 2012
Tweet

More Decks by rick446

Other Decks in Technology

Transcript

  1. -  Get  started  with  PyMongo   -  Sprinkle  in  some

     Ming  schemas   -  ODM:  When  a  dict  just  won’t  do  
  2. >>> import pymongo! >>> conn = pymongo.Connection()! >>> conn! Connection('localhost',

    27017)! >>> conn.test! Database(Connection('localhost', 27017), u'test')! >>> conn.test.foo! Collection(Database(Connection('localhost', 27017), u'test'), u'foo')! >>> conn['test-db']! Database(Connection('localhost', 27017), u'test-db')! >>> conn['test-db']['foo-collection']! Collection(Database(Connection('localhost', 27017), u'test- db'), u'foo-collection')! >>> conn.test.foo.bar.baz! Collection(Database(Connection('localhost', 27017), u'test'), u'foo.bar.baz')  
  3. >>> db = conn.test! >>> id = db.foo.insert({'bar':1, 'baz':[ 1,

    2, {'k':5} ] })! >>> id! ObjectId('4e712e21eb033009fa000000')! >>> db.foo.find()! <pymongo.cursor.Cursor object at 0x29c7d50>! >>> list(db.foo.find())! [{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {'k': 5}]}]! >>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})! >>> db.foo.find().next()! {u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {'k': 5}]}! >>> db.foo.remove({'_id':id})! >>> list(db.foo.find())! [ ]   Auto-­‐Generated  _id   Cursors  are  python   generators   Remove  uses  same   query  language  as  find()  
  4. >>> db.foo.insert([ dict(x=x) for x in range(10) ])! [ObjectId('4e71313aeb033009fa00000b'), …

    ] ! >>> list(db.foo.find({ 'x': {'$gt': 3} }))! [{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')}, {u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')}, {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …] ! >>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))! [{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8}, {u'x': 9}]! >>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) ! ... .skip(1).limit(2))! [{u'x': 5}, {u'x': 6}]! >>> db.foo.ensure_index([! ... ('x’,pymongo.ASCENDING),('y’,pymongo.DESCENDING)])! u'x_1_y_-1’   Range  Query   Partial  Retrieval   Compound  Indexes  
  5.   You  gotta  write  Javascript    (for  now)    

    It’s  pretty  slow  (single-­‐threaded  JS  engine)       Javascript  is  used  by     $where  in  a  query     .group(key,  condition,  initial,  reduce,  finalize=None)     .map_reduce(map,  reduce,  out,  finalize=None,  …)     Sharding  gives  some  parallelism  with  .map_reduce()  (and   possibly  ‘$where’).  Otherwise  you’re  single  threaded.   MongoDB  2.2  with  New   Aggregation  Framework   Coming  Real  Soon  Now  ™    
  6. >>> import gridfs! >>> fs = gridfs.GridFS(db)! >>> with fs.new_file()

    as fp:! ... fp.write('The file')! ... ! >>> fp! <gridfs.grid_file.GridIn object at 0x2cae910>! >>> fp._id! ObjectId('4e727f64eb03300c0b000003')! >>> fs.get(fp._id).read()! 'The file'   Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’)   Mime type, links to other docs, etc. Python  context   manager   Retrieve  file  by  _id  
  7. >>> file_id = fs.put('Moar data!', filename='foo.txt')! >>> fs.get_last_version('foo.txt').read()! 'Moar data!’!

    >>> file_id = fs.put('Even moar data!', filename='foo.txt')! >>> fs.get_last_version('foo.txt').read()! 'Even moar data!’! >>> fs.get_version('foo.txt', -2).read()! 'Moar data!’! >>> fs.list()! [u'foo.txt']! >>> fs.delete(fs.get_last_version('foo.txt')._id)! >>> fs.list()! [u'foo.txt']! >>> fs.delete(fs.get_last_version('foo.txt')._id)! >>> fs.list()! []   Create  file  by   filename   “2nd  from  the  last”  
  8. -  Get  started  with  PyMongo   -  Sprinkle  in  some

     Ming  schemas   -  ODM:  When  a  dict  just  won’t  do  
  9.   Your  data  has  a  schema     Your  database

     can  define  and  enforce  it     It  can  live  in  your  application  (as  with  MongoDB)     Nice  to  have  the  schema  defined  in  one  place  in  the  code     Sometimes  you  need  a  “migration”     Changing  the  structure/meaning  of  fields     Adding  indexes,  particularly  unique  indexes     Sometimes  lazy,  sometimes  eager     “Unit  of  work:”  Queuing  up  all  your  updates  can  be  handy  
  10. >>> import ming.datastore! >>> ds = ming.datastore.DataStore('mongodb://localhost:27017', database='test')! >>> ds.db!

    Database(Connection('localhost', 27017), u'test')! >>> session = ming.Session(ds)! >>> session.db! Database(Connection('localhost', 27017), u'test')! >>> ming.configure(**{! ... 'ming.main.master':'mongodb://localhost:27017', ! ... 'ming.main.database':'test'})! >>> Session.by_name('main').db! Database(Connection(u'localhost', 27017), u'test')   Connection  +   Database   Optimized  for  config   files    
  11. from ming import schema, Field! WikiDoc = collection(‘wiki_page', session,! Field('_id',

    schema.ObjectId()),! Field('title', str, index=True),! Field('text', str))! CommentDoc = collection(‘comment', session,! Field('_id', schema.ObjectId()),! Field('page_id', schema.ObjectId(), index=True),! Field('text', str))   Index  created  on   import   Shorthand  for   schema.String  
  12. from ming import Document, Session, Field! class WikiDoc(Document):! class __mongometa__:!

    session=Session.by_name(’main')! name='wiki_page’! indexes=[ ('title') ]! title = Field(str)! text = Field(str)!   Old declarative syntax continues to exist and be supported, but it’s not being actively improved   Sometimes nice when you want additional methods/ attrs on your document class
  13. >>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?'))! >>> doc.m.save()!

    >>> WikiDoc.m.find()! <ming.base.Cursor object at 0x2c2cd90>! >>> WikiDoc.m.find().all()! [{'text': u'I can haz cheezburger?', '_id': ObjectId ('4e727163eb03300c0b000001'), 'title': u'Cats'}]! >>> WikiDoc.m.find().one().text! u'I can haz cheezburger?’! >>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))! >>> doc.m.save()! Traceback (most recent call last):! File "<stdin>", line 1,! …! ming.schema.Invalid: <class 'ming.metadata.Document<wiki_page>'>:! Extra keys: set(['tietul'])   Documents  are  dict   subclasses   Exception  pinpoints   problem  
  14. >>> ming.datastore.DataStore('mim://', database='test').db! mim.Database(test)     MongoDB  is  (generally)  fast

        …  except  when  creating  databases     …  particularly  when  you  preallocate         Unit  tests  like  things  to  be  isolated     MIM  gives  you  isolation  at  the  expense  of  speed  &  scaling  
  15. -  Get  started  with  PyMongo   -  Sprinkle  in  some

     Ming  schemas   -  ODM:  When  a  dict  just  won’t  do  
  16. from ming import schema, Field! from ming.odm import (mapper, Mapper,

    RelationProperty, ! ForeignIdProperty)! WikiDoc = collection('wiki_page', session, … )! CommentDoc = collection(’comment’, session, … )! class WikiPage(object): pass! class Comment(object): pass! odmsession.mapper(WikiPage, WikiDoc, properties=dict(! comments=RelationProperty('WikiComment')))! odmsession.mapper(Comment, CommentDoc, properties=dict(! page_id=ForeignIdProperty('WikiPage'),! page=RelationProperty('WikiPage')))! Plain  Old  Python   Classes   Map  classes  to   collection  +  session   “Relations”  
  17. class WikiPage(MappedClass):! class __mongometa__:! session = main_odm_session! name='wiki_page’! indexes =

    [ 'title' ]! _id = FieldProperty(S.ObjectId)! title = FieldProperty(str)! text = FieldProperty(str)! comments = RelationProperty(’Comment’)!
  18.   Session    ODMSession     My_collection.m…    My_mapped_class.query…  

      ODMSession  actually  does  stuff       Track  object  identity     Track  object  modifications     Unit  of  work  flushing  all  changes  at  once   >>> pg = WikiPage(title='MyPage', text='is here')! >>> session.db.wiki_page.count()! 0! >>> main_orm_session.flush()! >>> session.db.wiki_page.count()! 1!
  19.   Various  plug  points  in  the  session     before_flush

        after_flush     Some  uses     Logging  changes  to  sensitive  data  or  for  analytics     Full-­‐text  search  indexing     “last  modified”  fields     Performance  instrumentation  
  20.   Various  plug  points  in  the  mapper     before_/after_:

        Insert     Update     Delete     Remove     Some  uses     Collection/model-­‐specific  logging  (user  creation,  etc.)     Anything  you  might  want  a  SessionExtension  for  but   would  rather  do  per-­‐model