Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Approach to Design Large Scale Data Centric ...

An Approach to Design Large Scale Data Centric Architecture Using MongoDB

MongoDB is a part of NoSQL family with schema-less architecture, which has no constraints due to which retrieval and storing of data is faster and efficient. MongoDB helps in keeping the data simple by breaking up the workloads and in turn helps in tuning the workload which enhances performance and availability. In common, massive information is generated from interactive processing and visualizations. Centric architecture has been recently proposed to efficiently support information transmission over the Internet which provides communication services to big data. The goal of this presentation is to focus on large scale data which will be massive adoption of scaling technique

This paper was presented in the 5th National Conference On Emerging Trends In IT on 26th February , 2014 at Christ University, Hosur Road, Bangalore - 560 029, India

Clarence J M Tauro (Couchbase)

February 26, 2014
Tweet

More Decks by Clarence J M Tauro (Couchbase)

Other Decks in Research

Transcript

  1. AN APPROACH TO DESIGN LARGE SCALE DATA CENTRIC ARCHITECTURE USING

    MONGODB By SUSHMITHA DIWAKAR ARULJOTHI ANNAMALAI CLARENCE J M TAURO Department of Computer Science Christ University, Hosur Road, Bangalore @ 5TH NATIONAL CONFERENCE ON EMERGING TRENDS IN IT ON 26TH FEBRUARY , 2014
  2. Objectives • What Scale Is? • How is/was Scale achieved?

    Traditional Way • Scaling Today • Introduction to NoSQL • MongoDB • Scaling with MongoDB • Replication in MongoDB • Search Design
  3. What Scale Is? • How well a solution to some

    problem will work when the size of the problem increases • Massive adoption/usage
  4. How is/was Scale achieved? Traditional Way • Less usage of

    Joins; Less triggers • DEnormalize as much as possible • Horizontal/Vertical replication • Increase hardware • Traditional RDBMS; Use ORMs like Hibernate • Manual process – Developers job
  5. Scaling Today • Much more persistence options • Cloud based

    architectures – completely abstract the underlying hardware from the developer • Use PaaS – CloudFoundry from Pivotal • Less developers
  6. Introduction to NoSQL • NoSQL stands for – “NoSQL” =

    “No SQL” = Not using traditional relational DBMS – “No SQL” Don’t use SQL language – No Join • Usually do not require a fixed table schema • All NoSQL offerings relax one or more of the ACID properties
  7. MongoDB • MongoDB ( from “humongous”) • Cross platform schemaless

    document-oriented NoSQL database • MongoDB uses BSON (JSON like structure) • Features include: – File storage – Indexing – Scaling – Replication
  8. Sample MongoDB Document { _id : ObjectId("4e77bb3b8a3e000000004f7a"), when : Date("2014-02-126T02:10:11.3Z",

    author : "arul", title : "MongoDB", text : "This is the text of the post", tags : [ "JSON", "BSON" ], votes : 5, voters : ["sushmita", "clarence", "jothi" ], }
  9. Scaling – Larger Level • Prefer simpler architectures • Completely

    breakdown workload • Fine-tune your workload • Do NOT use ORM – unless you really want to – Use simpler standards – Spring’s JdbcTemplate • Use smaller and fine-grained components to deploy your application • Shard • Replicate
  10. Scaling – Micro Level • Multiple documents vs. Nested documents

    • Indexing – Need to have right amount of indexes – More indexes make the DB slow. Esp. MongoDB • Transactions vs. Compensating Transactions – JTA transactions are highly discouraged
  11. Scaling – Shard Keys • Sharding is the process of

    storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth – MongoDB does a range based sharding – Sharding can increase the number of queries • Figure out the most common use case and then decide on sharding – Do this at design time
  12. Replication in MongoDB • MongoDB uses replica set to achieve

    replication • Replica set is a group of MongoDB instances that can host the same data set • Replica set has one node as primary node which receives all write operations, where all other instances are secondary’s, which applies operations from the primary node so they can have the same data set
  13. Search Functionality • NoSQL will be unique due to its

    special characteristic of “multi-attribute querying” • Multi Attribute Querying – Using $and operation db.inventory.find( { $and: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true} ] })
  14. Search Design • Search is based on sharding id •

    With the help of indexes the horizontal scaling technique is implemented
  15. Future Work • There are more things while designing a

    scalable architecture: – Locking – Random partitioning – Write concerns