Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Choosing The Right Database For The Right Appli...

Choosing The Right Database For The Right Applications in 2019 [AWS Community Day @ Sofia]

How to choose the right database for your applications. AWS offers the broadest range of purpose-built databases for specific application use cases. This session will provide an overview of AWS relational, key-value, in-memory, and graph databases to help you choose the right database for the right job. We’ll also briefly cover new launches (Timestream, Quantum Ledger, and more) and how to apply them to your existing and new applications – with real customer examples.

Alex Casalboni

May 11, 2019
Tweet

More Decks by Alex Casalboni

Other Decks in Programming

Transcript

  1. Alex Casalboni Technical Evangelist, AWS @alex_casalboni Choosing The Right Database

    For The Right Applications in 2019 @ 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved
  2. About me • Software Engineer & Web Developer • Worked

    in a startup for 4.5 years • ServerlessDays Organizer • AWS Customer since 2013
  3. 1970 1980 1990 2000 Oracle DB2 SQL Server MySQL PostgreSQL

    DynamoDB Redis MongoDB Elasticsearch Neptune Cassandra Access Aurora 2010 Timestream QLDB Amazon DocumentDB
  4. Data categories and common use cases Relational Key value Document

    In-memory Graph Search Time series Ledger Referential integrity, ACID transactions, schema- on-write Low-latency, key lookups with high throughput and fast ingestion of data Indexing and storing documents with support for query on any attribute Microseconds latency, key- based queries, and specialized data structures Creating and navigating data relations easily and quickly Lift and shift, EMR, CRM, finance Real-time bidding, shopping cart, social Content management, personalization, mobile Leaderboards, real-time analytics, caching Fraud detection, social networks, recommendatio n engines Indexing and searching semistructured logs and data Product catalog, help, and FAQs, full text Collect, store, and process data sequenced by time IoT applications, event tracking Complete, immutable, and verifiable history of all changes to application data Systems of record, supply chain, healthcare, registrations, financial
  5. AWS: Purpose-built databases Relational Key value Document In-memory Graph Search

    Amazon DynamoDB Amazon Neptune Amazon RDS Aurora Commercial Community Amazon ElastiCache Amazon Elasticsearch Service Amazon DocumentDB Time series Ledger Amazon Timestream Amazon Quantum Ledger Database Memcached Redis
  6. Two fundamental areas of focus “Lift and shift” existing apps

    to the cloud Quickly build new apps in the cloud
  7. AWS Database Migration Service (AWS DMS) M i g r

    a t i n g d a t a b a s e s t o A W S Migrate between on-premises and AWS Migrate between databases Automated schema conversion Data replication for migration with zero downtime 100,000+ databases migrated
  8. Modern apps create new requirements Users: 1 million+ Data volume:

    TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request rate: Millions Access: Web, mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: No assembly required Social media Ride hailing Media streaming Dating
  9. User search history: Amazon DynamoDB • Massive data volume •

    Need quick lookups for personalized search Session state: Amazon ElastiCache • In-memory store for sub-millisecond fetch Relational data: Amazon RDS • Referential integrity • Primary transactional database
  10. • DynamoDB: 31B items tracking language exercises • Aurora: Primary

    transactional db (user data) • ElastiCache: Instant access to common words and phrases 300M total users 7B exercises per month
  11. Traditional SQL • TCP based wire protocol • Well Known,

    lots of uses • Common drivers (JDBC) • Frequently used with ORMs • Scale UP individual instances • Scale OUT with read replicas • Sharding at application level • Lots of flavors but very similar language • Joins ! INSERT INTO users (id, first_name, last_name) VALUES (1, ‘Alex’, ‘Casalboni’); SELECT col1, col2, col3 FROM table1 WHERE col4 = 1 AND col5 = 2 GROUP BY col1 HAVING count(*) > 1 ORDER BY col2
  12. Relational model Data model • Data is stored in rows

    and tables • Data is normalized • Strict schema • Relationships established via keys enforced by the system • Data accuracy and consistency • Complex queries to extract and reshape data on-demand Patient * Patient ID First Name Last Name Gender DOB * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating
  13. Patient * Patient ID First Name Last Name Gender DOB

    * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating Query model: SQL SELECT d.first_name, d.last_name, count(*) FROM visit as v, hospital as h, doctor as d WHERE v.hospital_id = h.hospital_id AND h.hospital_id = d.hospital AND v.t_date > date_trunc('week’, CURRENT_TIMESTAMP - interval '1 week') GROUP BY d.first_name, d.last_name; Relational model
  14. Amazon Relational Database Service (RDS) Managed relational database service with

    a choice of six popular database engines Easy to administer Available and durable Highly scalable Fast and secure No need for infrastructure provisioning, installing, and maintaining DB software Automatic Multi-AZ data replication; automated backup, snapshots, failover Scale database compute and storage with a few clicks with no app downtime SSD storage and guaranteed provisioned I/O; data encryption at rest and in transit
  15. Amazon Aurora MySQL and PostgreSQL-compatible relational database built for the

    cloud Performance and availability of commercial-grade databases at 1/10th the cost Performance and scalability Availability and durability Highly secure Fully managed 5x perf of standard MySQL 3x perf of standard PostgreSQL Up to 15 read replicas Self-healing storage 6 copies of data across 3 AZ Continuous backup to S3 Network isolation, encryption at rest/transit No hardware provisioning, software patching, setup, configuration, or backups
  16. SQL vs NoSQL Optimized for storage Optimized for compute Normalized/relational

    Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL
  17. Key-value data • Simple key-value pairs • Partitioned by keys

    • Resilient to failure • High throughput, low- latency reads and writes • Consistent performance at scale Table 1 … … Partitions … Highly partitionable data
  18. Gamers Primary Key Attributes Gamer Tag Type Hammer57 Rank Level

    Points Tier 87 4050 Elite Status Health Progress 90 30 Weapon Class Damage Range Taser 87% 50 FluffyDuffy Rank Level Points Tier 5 1072 Trainee Status Health Progress 37 8 // Status of Hammer57 GET { TableName:"Gamers", Key: { "GamerTag":"Hammer57", "Type":"Status” } } // Return all Hammer57 Gamers GamerTag = :a :a Hammer57 Key-value data
  19. Amazon DynamoDB Fast and flexible key value database service for

    any scale Performance at scale Serverless Comprehensive security Global database for global apps Single-digit millisecond response times at any scale; Virtually unlimited throughput No server provisioning, software patching/upgrades; automatic scaling and continuous data backups Encrypts all data by default; fully integrated with AWS IAM for robust security Build global apps by easily replicating tables across multiple regions
  20. Amazon DynamoDB Data Structure Table Items Attributes Partition Key Sort

    Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values
  21. DynamoDB Schema and Queries • Connects over HTTP • Global

    Secondary Indexes and Local Secondary Indexes • Speed up queries with DAX • Global tables (multi-region-multi- master) • Transactions across multiple tables • Change Streams • Rich query language with expressions • Provision read and write capacity units separately • Also supports pay-per-request model import boto3 votes = boto3.resource("dynamodb").Table("votes") resp = votes.update_item( Key={"name": editor}, UpdateExpression="ADD votes :incr", ExpressionAttributeValues={":incr": 1}, ReturnValues="ALL_NEW" )
  22. DynamoDB Advancements over the last 24 months VPC endpoints April

    2017 Auto scaling June 2017 DynamoDB Accelerator (DAX) April 2017 Time To Live (TTL) February 2017 Global tables On-demand backup Encryption at rest November 2017 November 2017 November 2017 Point-in-time recovery March 2018 SLA June 2018 99.999% SLA August 2018 Adaptive capacity ACID November 2018 Transactions November 2018 On-demand
  23. Document databases • Data is stored in JSON-like documents •

    Documents map naturally to how humans model data • Flexible schema and indexing • Expressive query language built for documents (ad hoc queries and aggregations) JSON documents are first-class objects of the database { id: 1, name: "sue", age: 26, email: "[email protected]", promotions: ["new user", "5%", "dog lover"], memberDate: 2018-2-22, shoppingCart: [ {product:"abc", quantity:2, cost:19.99}, {product:"edf", quantity:3, cost: 2.99} ] }
  24. != == Evolution of document databases JSON became the de

    facto data interchange format Friction when converting JSON to the relational model Object-relational mappings (ORMs) were created to help with this friction Document databases solved the problem (Client) (App) (Database) JSON Relational JSON
  25. Use cases for document data User profiles { id: 181276,

    username: "sue1942", name: {first: "Susan", last: "Benoit"} } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"} } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"}, ExploidingSnails: { hi_score: 3185400, global_rank: 5139, bonus_levels: true }, promotions: ["new user","5%","snail lover"] } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"}, ExploidingSnails: { hi_score: 3185400, global_rank: 5139, bonus_levels: true } }
  26. Use cases for document data Mobile Retail and marketing User

    profiles Catalog Content management Personalization
  27. Amazon DocumentDB Fast, scalable, and fully managed MongoDB-compatible database service

    Fast Scalable Fully managed MongoDB compatible Millions of RPS with ms latency 2x throughput of MongoDB Separation of compute and storage Up to 15 read replicas Auto patching and quick setup Automatic backups Compatible with MongoDB 3.6 Same SDKs and tools
  28. In-memory • No persistence, in-memory • Microsecond performance • Simple

    commands for manipulating in memory data structures • Strings, hashes, lists, sets, and sorted sets Database Memory (buffer pool) Disk Query processor Get/Put APIs Memory Milliseconds to microseconds (10x faster) Storage engine
  29. In-memory ops set a "hello" // Set key "a" with

    a string value and no expiration OK get a // Get value for key "a" "hello" get b // Get value for key "b" results in miss (nil) set b "Good-bye" EX 5 // Set key "b" with a string value and a 5 second expiration "Good-bye" get b // Get value for key "b" "Good-bye" // wait >= 5 seconds get b (nil) // key has expired, nothing returned
  30. Amazon ElastiCache Redis and Memcached compatible, in-memory data store and

    cache Redis and Memcached compatible Extreme performance Secure and reliable Easily scalable Fully compatible with open source tools Redis and Memcached In-memory data store and cache Microsecond response time Network isolation, encryption, HIPAA, PIC Multi-AZ, automatic failover Scales reads and write Both sharding and replicas
  31. Search The bright blue butterfly hangs on the breeze It’s

    best to forget the great sky and to retire from every wind. Under blue sky, in bright sunlight, one need search around. Document 1 Document 2 Document 3 a and around every for from in is it not on one the to under ID Term Document 1 best 2 2 blue 1, 3 3 bright 1,3 4 breeze 1 5 butterfly 1 6 forget 2 7 great 2 8 hangs 1 9 need 3 10 retire 2 11 search 3 12 sky 2, 3 13 wind 2 Inverted index Stopword list
  32. Search _search?q=house "hits": { "total": 85, "max_score": 6.6137657, "hits": [{

    "_index": "movies", "_type": "movie", "_id": "tt0077975", "_score": 6.6137657, "_source": { "directors": [ "John Landis" ], "release_date": "1978-07-27T00:00:00Z", "rating": 7.5, "genres": [ "Comedy", "Romance" ], "image_url": "http://ia.jpg "plot": "At a 1962 College, Dean Vernon Wormer…", "title": "Animal House", "rank": 527, "running_time_secs": 6540, "actors": [ "John Belushi","Karen Allen","Tom Hulce" ], "year": 1978, "id": "tt0077975" } },
  33. Amazon Elasticsearch Service Fully managed, reliable, and scalable Elasticsearch service

    Easy to use Scalable Highly available Secure Deploy a production- ready Elasticsearch cluster in minutes Resize your cluster with a few clicks or a single API call Replicate across AZs, with monitoring and automated self-healing Deploy into VPC and restrict access using security groups and IAM policies
  34. Graph data • Relationships are first-class objects • Data is

    modeled and queried as a graph • Vertices connected by Edges • Creating and navigating relations between data easily and quickly Purchased Purchased Follows Purchased Knows Product Sport Follows
  35. Different approaches for highly connected data Purpose-built for a business

    process Purpose-built to answer questions about relationships
  36. Graph use cases Social networking Life sciences Network & IT

    operations Fraud detection Recommendation engines Knowledge graphs
  37. Product recommendation to a user PURCHASED PURCHASED PURCHASED PURCHASED PURCHASED

    KNOWS BOOK #1 BOOK #2 PURCHASED BOOK #3 Graph example
  38. Amazon Neptune Fully managed graph database Fast Reliable Easy Open

    Query billions of relationships with millisecond latency 6 replicas of your data across 3 AZs with fully backup and restore Build powerful queries with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF
  39. Open Source Apache TinkerPop™ Gremlin Traversal Language W3C Standard SPARQL

    Query Language R E S O U R C E D E S C R I P T I O N F R A M E W O R K ( R D F ) P R O P E R T Y G R A P H LEADING GRAPH MODELS AND FRAMEWORKS
  40. Gremlin Traversal Language g.addV('person').property(id, 1).property('name', ‘alex') g.V('1').property(single, 'age’, 30) g.addV('person').property(id,

    2).property('name', ‘julien') g.addE('knows').from(g.V('1')).to(g.V('2')).property('weight', 1.0) g.V().hasLabel('person') g.V().has('name', ‘alex').out('knows').valueMap() tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps
  41. Retail demo application Demo application: 1. Available today 2. On

    GitHub: /aws-samples/aws- bookstore-demo-app 3. One-click AWS CloudFormation deployment Search Indexing and searching semistructured logs and data Product search Amazon Neptune Amazon Elasticsearch Service Key-value High throughput, Low- latency reads and writes, endless scale Product catalog, shopping cart Graph Quickly and easily create and navigate relationships between data Product recommendation In-memory Query by key with microsecond latency Product leaderboard DynamoDB ElastiCache
  42. Time series data What is time series data? What is

    special about a time series database? A sequence of data points recorded over a time interval Time is the single primary axis of the data model t
  43. Time series use cases Application events IoT sensor readings DevOps

    data Humidity % Water vapor 91.0 94.0 86.0 93.0
  44. Existing time-series databases Relational databases Difficult to maintain high availability

    Difficult to scale Limited data lifecycle management Inefficient time series data processing Unnatural for time series data Rigid schema inflexible for fast moving time series data Building with time series data is challenging
  45. Amazon Timestream (sign up for the preview) Fast, scalable, fully

    managed time-series database 1,000x faster and 1/10 the cost of relational databases Collect data at the rate of millions of inserts per second (10M/second) Trillions of daily events Adaptive query processing engine maintains steady, predictable performance Time-series analytics Built-in functions for interpolation, smoothing, and approximation Serverless Automated setup, configuration, server provisioning, software patching
  46. Common customer use cases Ledgers with centralized control Healthcare Verify

    and track hospital equipment inventory Manufacturers Track distribution of a recalled product HR & payroll Track changes to an individual’s profile Government Track vehicle title history
  47. Challenges with building ledgers Adds unnecessary complexity Blockchain RDBMS –

    audit tables Difficult to maintain Hard to use and slow Hard to build Custom audit functionality using triggers or stored procedures Impossible to verify No way to verify changes made to data by sys admins
  48. Ledger database concepts C | H J Journal C |

    H Current | History Current | History Journal Ledger comprises J L Ledger database L Journal determines Current | History
  49. ID Manufacturer Model Year VIN Owner ID Version Start End

    Manufacturer Model Year VIN Owner How it works ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Traci Russell INSERT INTO cars << { 'Manufacturer': 'Tesla', 'Model': 'Model S', 'Year': '2012', 'VIN': '123456789', 'Owner': 'Traci Russel' } >> FROM cars WHERE VIN = '123456789' UPDATE owner = 'Ronnie Nash' FROM cars WHERE VIN = '123456789' UPDATE owner = 'Elmer Hubbard' J ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 NULL Tesla Model S 2012 123456789 Traci Russell current.cars C history.cars H ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 08/03/2013 Tesla Model S 2012 123456789 Traci Russell 1 2 08/03/2013 NULL Tesla Model S 2012 123456789 Ronnie Nash ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 08/03/2013 Tesla Model S 2012 123456789 Traci Russell 1 2 08/03/2013 09/02/2016 Tesla Model S 2012 123456789 Ronnie Nash 1 3 09/02/2016 NULL Tesla Model S 2012 123456789 Elmer Hubbard ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Ronnie Nash ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Elmer Hubbard INSERT cars ID:1 Manufacturer: Tesla Model: Model S Year: 2012 VIN: 123456789 Owner: Traci Russell Metadata: { Date:07/16/2012 } H (x) UPDATE cars ID:1 Owner: Ronnie Nash Metadata: { Date:08/03/2013 } H (x) UPDATE cars ID:1 Owner: Elmer Hubbard Metadata: { Date: 09/02/2016 } H (x)
  50. Amazon Quantum Ledger Database (QLDB) Fully managed ledger database Track

    and verify history of all changes made to your application’s data Immutable Maintains a sequenced record of all changes to your data, which cannot be deleted or modified; you can to query and analyze the full history Cryptographically verifiable Uses cryptography to generate a secure output file of your data’s history Easy to use Easy to use, letting you use familiar database capabilities like SQL APIs for querying the data Highly scalable Executes 2–3x as many transactions than ledgers in common blockchain frameworks
  51. AWS: Purpose-built databases Relational Key value Document In-memory Graph Search

    Amazon DynamoDB Amazon Neptune Amazon RDS Aurora Commercial Community Amazon ElastiCache Amazon Elasticsearch Service Amazon DocumentDB Time series Ledger Amazon Timestream Amazon Quantum Ledger Database Memcached Redis
  52. Additional resources Andy Jassy’s re:Invent 2017/2018 keynotes youtu.be/1IxDLeFQKPk?t=37m47s youtu.be/ZOIkOnW640A?t=3238 Werner

    Vogel’s blog: A one size fits all database doesn't fit anyone allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html aws.amazon.com/products/databases/ aws.amazon.com/blogs/database/
  53. re:Invent videos youtu.be/hwnNbLXN4vA youtu.be/-pb-DkD6cWg AWS re:Invent 2018: Databases on AWS:

    The Right Tool for the Right Job (DAT205-R1) AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301)
  54. Alex Casalboni Technical Evangelist, AWS @alex_casalboni @ 2019, Amazon Web

    Services, Inc. or its Affiliates. All rights reserved Thank you!