Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB to Cassandra

MongoDB to Cassandra

MetaBroadcast's presentation at the Cassandra Meetup, by Fred van den Driessche, Tom McAdam, Adam Horwich.
Dec. 10th 2012

MetaBroadcast

December 10, 2012
Tweet

More Decks by MetaBroadcast

Other Decks in Programming

Transcript

  1. MongoDB to Cassandra The Atlas Odyssey Fred van den Driessche

    Engineer @fredvdd Tom McAdam CTO @tfm Adam Horwich Systems Engineer @Mmmkayness
  2. tbc Video and audio metadata from 20+ sources Profiles and

    activity from video and audio products, social networks Our platform - late 2012 tbc MetaBroadcast platform Analytic requests and groupings
  3. ?

  4. What is Atlas? ATLAS DB BBC PA C4 etc... /content

    /schedules /topics sitemaps radioplayer interlinking
  5. Where MongoDB falls short • too simple • lack of

    control • sharding • embedding
  6. Atlas API • content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations

    • schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk • api explorer http://atlas.metabroadcast.com/#apiExplorer
  7. Atlas API • content • http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82 • http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/ b0074g7p&annotations=description,brand_summary,locations

    • schedules • http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus. 3h&channel=bbcone&publisher=bbc.co.uk • http://atlas.metabroadcast.com/3.0/schedule.json? from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk • api explorer http://atlas.metabroadcast.com/#apiExplorer
  8. Data model • columns to model annotations • secondary indexes

    • index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM). from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);
  9. ID generation • give external data our own ID on

    ingest • needs to be user-friendly: http://www.radiotimes.com/programme/cf2/eastenders • mongo: findAndModify() • solution: uses Astyanax client with its distributed locking • more details: http://metabroadcast.com/blog/let- cassandra-identify-your-data
  10. Where we’re at • already live with some data •

    alpha release of schedule endpoint coming soon • later: roll out across other endpoints
  11. Ops

  12. Ops in Cassandra • we love Puppet • it’s great

    for automation and deployment • MongoDB: 1 file • Cassandra: 2 files! • oh... tokens
  13. Cassandra Tokens • define where data is written to in

    a cluster • therefore balanced tokens = balanced cluster • tokens should be rack aware • tools available to provide appropriate tokens for you
  14. Cassandra plays nicely with AWS • datacentre / rack aware

    • AWS Region = Datacentre • AWS Availability Zone = Rack • only recently introduced in MongoDB but simple to implement in Cassandra • horizontally (and vertically) scalable
  15. Monitoring • Nagios is a little threadbare for Cassandra •

    basic TCP service check • stats from API not very helpful • nodetool and CLI tools useful • manual effort to integrate them • if only there was some useful service...
  16. OpsCenter • wonderful for an overview • not so much

    for alerting ;) • ohai API • can integrate metrics into Nagios
  17. Disaster Recovery • we operate a 4 node cluster presently

    • replication factor of 3 with quorum read/writes • DR complicated by tokens • cluster should be balanced • snapshot + S3 Backups
  18. Cluster Happiness and Headaches • little maintenance overhead • cluster

    rebalancing • uncommon maintenance procedure • schema changes are cumbersome • little scope for rollback, can put cluster in unrecoverable state
  19. Summary • Mongo is good, Atlas has outgrown it •

    Cassandra isn’t a drop-in replacement • Ops more complex but so far so good