Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Applications with DynamoDB

Building Applications with DynamoDB

Amazon DynamoDB is a managed NoSQL database. These slides introduce DynamoDB and discuss best practices for data modeling and primary key selection.

Matt Wood

May 16, 2012
Tweet

More Decks by Matt Wood

Other Decks in Technology

Transcript

  1. DynamoDB
    Building Applications
    with
    An Online Seminar - 16th May 2012
    Dr Matt Wood, Amazon Web Services

    View Slide

  2. Thank you!

    View Slide

  3. Building Applications with DynamoDB

    View Slide

  4. Building Applications with DynamoDB
    Getting started

    View Slide

  5. Building Applications with DynamoDB
    Getting started
    Data modeling

    View Slide

  6. Building Applications with DynamoDB
    Getting started
    Data modeling
    Partitioning

    View Slide

  7. Building Applications with DynamoDB
    Getting started
    Data modeling
    Partitioning
    Analytics

    View Slide

  8. Getting started with
    DynamoDB
    quick review

    View Slide

  9. DynamoDB is a managed
    NoSQL database service.
    Store and retrieve any amount of data.
    Serve any level of request traffic.

    View Slide

  10. Without the
    operational burden.

    View Slide

  11. Consistent, predictable
    performance.
    Single digit millisecond latencies.
    Backed on solid-state drives.

    View Slide

  12. Flexible data model.
    Key/attribute pairs.
    No schema required.
    Easy to create. Easy to adjust.

    View Slide

  13. Seamless scalability.
    No table size limits. Unlimited storage.
    No downtime.

    View Slide

  14. Durable.
    Consistent, disk-only writes.
    Replication across data centres and
    availability zones.

    View Slide

  15. Without the
    operational burden.

    View Slide

  16. Without the
    operational burden.
    FOCUS ON YOUR APP

    View Slide

  17. Two decisions + three clicks
    = ready for use

    View Slide

  18. Two decisions + three clicks
    = ready for use
    Primary keys +
    level of throughput

    View Slide

  19. Provisioned throughput.
    Reserve IOPS for reads and writes.
    Scale up (or down) at any time.

    View Slide

  20. Pay per capacity unit.
    Priced per hour of
    provisioned throughput.

    View Slide

  21. Write throughput.
    $0.01 per hour for 10 write units
    Units = size of item x writes/second

    View Slide

  22. Consistent writes.
    Atomic increment/decrement.
    Optimistic concurrency control.
    aka: “conditional writes”.

    View Slide

  23. Transactions.
    Item level transactions only.
    Puts, updates and deletes are ACID.

    View Slide

  24. Read throughput.
    strongly consistent
    eventually consistent

    View Slide

  25. Read throughput.
    $0.01 per hour for 50 read units
    Provisioned units =
    size of item x reads/second
    strongly consistent
    eventually consistent

    View Slide

  26. Read throughput.
    $0.01 per hour for 100 read units
    Provisioned units =
    size of item x reads/second
    2
    strongly consistent
    eventually consistent

    View Slide

  27. Read throughput.
    Mix and match at “read time”.
    Same latency expectations.
    strongly consistent
    eventually consistent

    View Slide

  28. Two decisions + three clicks
    = ready for use

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. Two decisions + three clicks
    = ready for use

    View Slide

  33. Two decisions + one API call
    = ready for use

    View Slide

  34. $create_response = $dynamodb->create_table(array(
    'TableName' => 'ProductCatalog',
    'KeySchema' => array(
    'HashKeyElement' => array(
    'AttributeName' => 'Id',
    'AttributeType' => AmazonDynamoDB::TYPE_NUMBER
    )
    ),
    'ProvisionedThroughput' => array(
    'ReadCapacityUnits' => 10,
    'WriteCapacityUnits' => 5
    )
    ));

    View Slide

  35. Two decisions + one API call
    = ready for use

    View Slide

  36. Two decisions + one API call
    = ready for development

    View Slide

  37. Two decisions + one API call
    = ready for production

    View Slide

  38. Two decisions + one API call
    = ready for scale

    View Slide

  39. View Slide

  40. Authentication.
    Session based to minimize latency.
    Uses Amazon Security Token Service.
    Handled by AWS SDKs.
    Integrates with IAM.

    View Slide

  41. Monitoring.
    CloudWatch metrics:
    latency, consumed read and write
    throughput, errors and throttling.

    View Slide

  42. Libraries, mappers & mocks.
    http://j.mp/dynamodb-libs
    ColdFusion, Django, Erlang, Java, .Net,
    Node.js, Perl, PHP, Python, Ruby

    View Slide

  43. DynamoDB data models

    View Slide

  44. DynamoDB semantics.
    Tables, items and attributes.

    View Slide

  45. Tables contain items.
    Unlimited items per table.

    View Slide

  46. Items are a collection of
    attributes.
    Each attribute has a key and a value.
    An item can have any number of
    attributes, up to 64k total.

    View Slide

  47. Two scalar data types.
    String: Unicode, UTF8 binary encoding.
    Number: 38 digit precision.
    Multi-value strings and numbers.

    View Slide

  48. id = 100 date =
    2012-05-16-09-00-10
    total = 25.00
    id = 101 date =
    2012-05-15-15-00-11
    total = 35.00
    id = 101 date =
    2012-05-16-12-00-10
    total = 100.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 20.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 120.00

    View Slide

  49. id = 100 date =
    2012-05-16-09-00-10
    total = 25.00
    id = 101 date =
    2012-05-15-15-00-11
    total = 35.00
    id = 101 date =
    2012-05-16-12-00-10
    total = 100.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 20.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 120.00
    Table

    View Slide

  50. id = 100 date =
    2012-05-16-09-00-10
    total = 25.00
    id = 101 date =
    2012-05-15-15-00-11
    total = 35.00
    id = 101 date =
    2012-05-16-12-00-10
    total = 100.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 20.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 120.00
    Item

    View Slide

  51. id = 100 date =
    2012-05-16-09-00-10
    total = 25.00
    id = 101 date =
    2012-05-15-15-00-11
    total = 35.00
    id = 101 date =
    2012-05-16-12-00-10
    total = 100.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 20.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 120.00
    Attribute

    View Slide

  52. Where is the schema?
    Tables do not require a formal schema.
    Items are an arbitrary sized hash.
    Just need to specify the primary key.

    View Slide

  53. Items are indexed by
    primary key.
    Single hash keys and composite keys.

    View Slide

  54. id = 100 date =
    2012-05-16-09-00-10
    total = 25.00
    id = 101 date =
    2012-05-15-15-00-11
    total = 35.00
    id = 101 date =
    2012-05-16-12-00-10
    total = 100.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 20.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 120.00
    Hash Key

    View Slide

  55. Range key for queries.
    Querying items by composite key.

    View Slide

  56. id = 100 date =
    2012-05-16-09-00-10
    total = 25.00
    id = 101 date =
    2012-05-15-15-00-11
    total = 35.00
    id = 101 date =
    2012-05-16-12-00-10
    total = 100.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 20.00
    id = 102 date =
    2012-03-20-18-23-10
    total = 120.00
    Hash Key Range Key
    +

    View Slide

  57. Programming DynamoDB.
    Small but perfectly formed.
    Whole programming interface
    fits on one slide.

    View Slide

  58. CreateTable
    UpdateTable
    DeleteTable
    DescribeTable
    ListTables
    PutItem
    GetItem
    UpdateItem
    DeleteItem
    BatchGetItem
    BatchWriteItem
    Query
    Scan

    View Slide

  59. CreateTable
    UpdateTable
    DeleteTable
    DescribeTable
    ListTables
    PutItem
    GetItem
    UpdateItem
    DeleteItem
    BatchGetItem
    BatchWriteItem
    Query
    Scan

    View Slide

  60. CreateTable
    UpdateTable
    DeleteTable
    DescribeTable
    ListTables
    PutItem
    GetItem
    UpdateItem
    DeleteItem
    BatchGetItem
    BatchWriteItem
    Query
    Scan

    View Slide

  61. Conditional updates.
    PutItem, UpdateItem, DeleteItem can
    take optional conditions for operation.
    UpdateItem performs atomic
    increments.

    View Slide

  62. CreateTable
    UpdateTable
    DeleteTable
    DescribeTable
    ListTables
    PutItem
    GetItem
    UpdateItem
    DeleteItem
    BatchGetItem
    BatchWriteItem
    Query
    Scan

    View Slide

  63. One API call, multiple items.
    BatchGet returns multiple items by
    primary key.
    BatchWrite performs up to 25 put or
    delete operations.
    Throughput is measured by IO,
    not API calls.

    View Slide

  64. CreateTable
    UpdateTable
    DeleteTable
    DescribeTable
    ListTables
    PutItem
    GetItem
    UpdateItem
    DeleteItem
    BatchGetItem
    BatchWriteItem
    Query
    Scan

    View Slide

  65. Query vs Scan
    Query for composite key queries.
    Scan for full table scans, exports.
    Both support pages and limits.
    Maximum response is 1Mb in size.

    View Slide

  66. Query patterns.
    Retrieve all items by hash key.
    Range key conditions:
    ==, <, >, >=, <=, begins with, between.
    Counts. Top and bottom n values.
    Paged responses.

    View Slide

  67. Modeling patterns

    View Slide

  68. 1. Mapping relationships
    with range keys.
    No cross-table joins in DynamoDB.
    Use composite keys to model
    relationships.
    Patterns

    View Slide

  69. Data model example: online gaming.
    Storing scores and leader boards.
    Players with
    high Scores.
    Leader board for
    each game.

    View Slide

  70. Data model example: online gaming.
    Storing scores and leader boards.
    Players with
    high Scores.
    Leader board for
    each game.
    user_id =
    mza
    location =
    Cambridge
    joined =
    2011-07-04
    user_id =
    jeffbarr
    location =
    Seattle
    joined =
    2012-01-20
    user_id =
    werner
    location =
    Worldwide
    joined =
    2011-05-15
    Players: hash key

    View Slide

  71. Data model example: online gaming.
    Storing scores and leader boards.
    Players with
    high Scores.
    Leader board for
    each game.
    user_id =
    mza
    location =
    Cambridge
    joined =
    2011-07-04
    user_id =
    jeffbarr
    location =
    Seattle
    joined =
    2012-01-20
    user_id =
    werner
    location =
    Worldwide
    joined =
    2011-05-15
    Players: hash key
    user_id =
    mza
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    werner
    location =
    bejewelled
    score =
    55,000
    Scores: composite key

    View Slide

  72. Data model example: online gaming.
    Storing scores and leader boards.
    Players with
    high Scores.
    Leader board for
    each game.
    user_id =
    mza
    location =
    Cambridge
    joined =
    2011-07-04
    user_id =
    jeffbarr
    location =
    Seattle
    joined =
    2012-01-20
    user_id =
    werner
    location =
    Worldwide
    joined =
    2011-05-15
    Players: hash key
    user_id =
    mza
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    werner
    location =
    bejewelled
    score =
    55,000
    Scores: composite key
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    mza
    game =
    tetris
    score =
    9,000,000
    user_id =
    jeffbarr
    Leader boards: composite key

    View Slide

  73. Data model example: online gaming.
    Storing scores and leader boards.
    user_id =
    mza
    location =
    Cambridge
    joined =
    2011-07-04
    user_id =
    jeffbarr
    location =
    Seattle
    joined =
    2012-01-20
    user_id =
    werner
    location =
    Worldwide
    joined =
    2011-05-15
    Players: hash key
    user_id =
    mza
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    werner
    location =
    bejewelled
    score =
    55,000
    Scores: composite key
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    mza
    game =
    tetris
    score =
    9,000,000
    user_id =
    jeffbarr
    Leader boards: composite key
    Scores by user
    (and by game)

    View Slide

  74. Data model example: online gaming.
    Storing scores and leader boards.
    user_id =
    mza
    location =
    Cambridge
    joined =
    2011-07-04
    user_id =
    jeffbarr
    location =
    Seattle
    joined =
    2012-01-20
    user_id =
    werner
    location =
    Worldwide
    joined =
    2011-05-15
    Players: hash key
    user_id =
    mza
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    werner
    location =
    bejewelled
    score =
    55,000
    Scores: composite key
    game =
    angry-birds
    score =
    11,000
    user_id =
    mza
    game =
    tetris
    score =
    1,223,000
    user_id =
    mza
    game =
    tetris
    score =
    9,000,000
    user_id =
    jeffbarr
    Leader boards: composite key
    High scores by
    game

    View Slide

  75. 2. Handling large items.
    Unlimited attributes per item.
    Unlimited items per table.
    Max 64k per item.
    Patterns

    View Slide

  76. Data model example: large items.
    Storing more than 64k across items.
    message_id =
    1
    part =
    1
    message =

    message_id =
    1
    part =
    2
    message =

    message_id =
    1
    part =
    3
    joined =

    Large messages: composite keys
    Split attributes across items.
    Query by message_id and part to retrieve.

    View Slide

  77. Store a pointer to objects in
    Amazon S3.
    Large data stored in S3.
    Location stored in DynamoDB.
    99.999999999% data durability in S3.
    Patterns

    View Slide

  78. 3. Managing secondary
    indices.
    Not supported by DynamoDB.
    Create your own.
    Patterns

    View Slide

  79. Data model example: secondary indices.
    Storing more than 64k across items.
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels
    Users: hash key

    View Slide

  80. Data model example: secondary indices.
    Storing more than 64k across items.
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels
    Users: hash key
    first_name =
    Matt
    user_id =
    mza
    first_name =
    Matt
    user_id =
    mattfox
    first_name =
    Werner
    user_id =
    werner
    First name index: composite keys

    View Slide

  81. Data model example: secondary indices.
    Storing more than 64k across items.
    Users: hash key
    first_name =
    Matt
    user_id =
    mza
    first_name =
    Matt
    user_id =
    mattfox
    first_name =
    Werner
    user_id =
    werner
    First name index: composite keys Second name index: composite keys
    last_name =
    Wood
    user_id =
    mza
    last_name =
    Fox
    user_id =
    mattfox
    last_name =
    Vogels
    user_id =
    werner
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels

    View Slide

  82. last_name =
    Wood
    user_id =
    mza
    last_name =
    Fox
    user_id =
    mattfox
    last_name =
    Vogels
    user_id =
    werner
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels
    Data model example: secondary indices.
    Storing more than 64k across items.
    Users: hash key
    first_name =
    Matt
    user_id =
    mza
    first_name =
    Matt
    user_id =
    mattfox
    first_name =
    Werner
    user_id =
    werner
    First name index: composite keys Second name index: composite keys

    View Slide

  83. last_name =
    Wood
    user_id =
    mza
    last_name =
    Fox
    user_id =
    mattfox
    last_name =
    Vogels
    user_id =
    werner
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels
    Data model example: secondary indices.
    Storing more than 64k across items.
    Users: hash key
    first_name =
    Matt
    user_id =
    mza
    first_name =
    Matt
    user_id =
    mattfox
    first_name =
    Werner
    user_id =
    werner
    First name index: composite keys Second name index: composite keys

    View Slide

  84. 4. Time series data.
    Logging, click through, ad views,
    game play data, application usage.
    Non-uniform access patterns.
    Newer data is ‘live’.
    Older data is read only.
    Patterns

    View Slide

  85. Data model example: time series data.
    Rolling tables for hot and cold data.
    event_id =
    1000
    timestamp =
    2012-05-16-09-59-01
    key =
    value
    event_id =
    1001
    timestamp =
    2012-05-16-09-59-02
    key =
    value
    event_id =
    1002
    timestamp =
    2012-05-16-09-59-02
    key =
    value
    Events table: composite keys

    View Slide

  86. Data model example: time series data.
    Rolling tables for hot and cold data.
    event_id =
    1000
    timestamp =
    2012-05-16-09-59-01
    key =
    value
    event_id =
    1001
    timestamp =
    2012-05-16-09-59-02
    key =
    value
    event_id =
    1002
    timestamp =
    2012-05-16-09-59-02
    key =
    value
    Events table: composite keys
    Events table for April: composite keys Events table for January: composite keys
    event_id =
    400
    timestamp =
    2012-04-01-00-00-01
    event_id =
    401
    timestamp =
    2012-04-01-00-00-02
    event_id =
    402
    timestamp =
    2012-04-01-00-00-03
    event_id =
    100
    timestamp =
    2012-01-01-00-00-01
    event_id =
    101
    timestamp =
    2012-01-01-00-00-02
    event_id =
    102
    timestamp =
    2012-01-01-00-00-03

    View Slide

  87. Hot and cold tables.
    Jan April May
    Feb Mar
    Dec
    Patterns

    View Slide

  88. Hot and cold tables.
    Jan April May
    Feb Mar
    higher
    throughput
    Dec
    Patterns

    View Slide

  89. Hot and cold tables.
    Jan April May
    Feb Mar
    higher
    throughput
    lower
    throughput
    Dec
    Patterns

    View Slide

  90. Hot and cold tables.
    Jan April May
    Feb Mar
    data to S3,
    delete cold tables
    Dec
    Patterns

    View Slide

  91. Hot and cold tables.
    Feb May June
    Mar Apr
    Jan
    Patterns

    View Slide

  92. Hot and cold tables.
    Mar June July
    Apr May
    Feb
    Patterns

    View Slide

  93. Hot and cold tables.
    Apr July Aug
    May June
    Mar
    Patterns

    View Slide

  94. Hot and cold tables.
    May Aug Sept
    June July
    Apr
    Patterns

    View Slide

  95. Hot and cold tables.
    June Sept Oct
    July Aug
    May
    Patterns

    View Slide

  96. Not out of mind.
    DynamoDB and S3 data can be
    integrated for analytics.
    Run queries across hot and cold data
    with Elastic MapReduce.
    Patterns

    View Slide

  97. Partitioning best practices

    View Slide

  98. Uniform workloads.
    DynamoDB divides table data into
    multiple partitions.
    Data is distributed primarily by
    hash key.
    Provisioned throughput is divided
    evenly across the partitions.

    View Slide

  99. Uniform workloads.
    To achieve and maintain full
    provisioned throughput for a table,
    spread your workload evenly across
    the hash keys.

    View Slide

  100. Non-uniform workloads.
    Some requests might be throttled,
    even at high levels of provisioned
    throughput.
    Some best practices...

    View Slide

  101. 1. Distinct values for hash
    keys.
    Patterns
    Hash key elements should have a
    high number of distinct values.

    View Slide

  102. Data model example: hash key selection.
    Well distributed work loads
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    jeffbarr
    first_name =
    Jeff
    last_name =
    Barr
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    ... ... ...
    Users

    View Slide

  103. Data model example: hash key selection.
    Well distributed work loads
    user_id =
    mza
    first_name =
    Matt
    last_name =
    Wood
    user_id =
    jeffbarr
    first_name =
    Jeff
    last_name =
    Barr
    user_id =
    werner
    first_name =
    Werner
    last_name =
    Vogels
    user_id =
    mattfox
    first_name =
    Matt
    last_name =
    Fox
    ... ... ...
    Users
    Lots of users with unique user_id.
    Workload well distributed across user partitions.

    View Slide

  104. 2. Avoid limited hash key
    values.
    Patterns
    Hash key elements should have a
    high number of distinct values.

    View Slide

  105. Data model example: small hash value range.
    Non-uniform workload.
    status =
    200
    date =
    2012-04-01-00-00-01
    status =
    404
    date =
    2012-04-01-00-00-01
    status
    404
    date =
    2012-04-01-00-00-01
    status =
    404
    date =
    2012-04-01-00-00-01
    Status responses

    View Slide

  106. Data model example: small hash value range.
    Non-uniform workload.
    status =
    200
    date =
    2012-04-01-00-00-01
    status =
    404
    date =
    2012-04-01-00-00-01
    status
    404
    date =
    2012-04-01-00-00-01
    status =
    404
    date =
    2012-04-01-00-00-01
    Status responses
    Small number of status codes.
    Unevenly, non-uniform workload.

    View Slide

  107. 3. Model for even
    distribution of access.
    Patterns
    Access by hash key value should be
    evenly distributed across the dataset.

    View Slide

  108. Data model example: uneven access pattern by key.
    Non-uniform access workload.
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-01
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-02
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-03
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-04
    ... ...
    Devices

    View Slide

  109. mobile_id =
    100
    access_date =
    2012-04-01-00-00-01
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-02
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-03
    mobile_id =
    100
    access_date =
    2012-04-01-00-00-04
    ... ...
    Devices
    Large number of devices.
    Small number which are much more popular than others.
    Workload unevenly distributed.
    Data model example: uneven access pattern by key.
    Non-uniform access workload.

    View Slide

  110. mobile_id =
    100.1
    access_date =
    2012-04-01-00-00-01
    mobile_id =
    100.2
    access_date =
    2012-04-01-00-00-02
    mobile_id =
    100.3
    access_date =
    2012-04-01-00-00-03
    mobile_id =
    100.4
    access_date =
    2012-04-01-00-00-04
    ... ...
    Devices
    Randomize access pattern.
    Workload randomised by hash key.
    Data model example: randomize access pattern by key.
    Towards a uniform workload.

    View Slide

  111. Design for a uniform
    workload.

    View Slide

  112. Analytics with DynamoDB

    View Slide

  113. Seamless scale.
    Scalable methods for data processing.
    Scalable methods for backup/restore.

    View Slide

  114. Amazon Elastic MapReduce.
    http://aws.amazon.com/emr
    Managed Hadoop service for
    data-intensive workflows.

    View Slide

  115. Hadoop under the hood.
    Take advantage of the Hadoop
    ecosystem: streaming interfaces,
    Hive, Pig, Mahout.

    View Slide

  116. Distributed data processing.
    API driven. Analytics at any scale.

    View Slide

  117. Query flexibility with Hive.
    create external table items_db
    (id string, votes bigint, views bigint) stored by
    'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    tblproperties
    ("dynamodb.table.name" = "items",
    "dynamodb.column.mapping" =
    "id:id,votes:votes,views:views");

    View Slide

  118. Query flexibility with Hive.
    select id, likes, views
    from items_db
    order by views desc;

    View Slide

  119. Data export/import.
    Use EMR for backup and restore
    to Amazon S3.

    View Slide

  120. Data export/import.
    CREATE EXTERNAL TABLE orders_s3_new_export ( order_id
    string, customer_id string, order_date int, total
    double )
    PARTITIONED BY (year string, month string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    LOCATION 's3://export_bucket';
    INSERT OVERWRITE TABLE
    orders_s3_new_export
    PARTITION (year='2012', month='01')
    SELECT * from orders_ddb_2012_01;

    View Slide

  121. Integrate live and
    archive data
    Run queries across external Hive tables
    on S3 and DynamoDB.
    Live & archive. Metadata & big objects.

    View Slide

  122. In summary...
    DynamoDB
    Predictable performance
    Provisioned throughput
    Libraries & mappers

    View Slide

  123. In summary...
    DynamoDB
    Data modeling
    Predictable performance
    Provisioned throughput
    Libraries & mappers
    Tables & items
    Read & write patterns
    Time series data

    View Slide

  124. In summary...
    DynamoDB
    Data modeling
    Partitioning
    Predictable performance
    Provisioned throughput
    Libraries & mappers
    Tables & items
    Read & write patterns
    Time series data
    Automatic partitioning
    Hot and cold data
    Size/throughput ratio

    View Slide

  125. In summary...
    DynamoDB
    Data modeling
    Partitioning
    Analytics
    Predictable performance
    Provisioned throughput
    Libraries & mappers
    Tables & items
    Read & write patterns
    Time series data
    Automatic partitioning
    Hot and cold data
    Size/throughput ratio
    Elastic MapReduce
    Hive queries
    Backup & restore

    View Slide

  126. DynamoDB free tier
    5 writes, 10 consistent reads per second
    100Mb of storage

    View Slide

  127. aws.amazon.com/dynamodb
    aws.amazon.com/documentation/dynamodb
    best practice + sample code

    View Slide

  128. Thank you!

    View Slide

  129. View Slide