Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data-driven Innovation

Matt Wood
October 10, 2012

Data-driven Innovation

Slides from my session at the #AWS Public Sector Summit, 2012.

Matt Wood

October 10, 2012

More Decks by Matt Wood

Other Decks in Technology


  1. DNA

  2. A T C G G T C C A G

    G A G C C A G G U C C Transcription
  3. A T C G G T C C A G

    G A G C C A G G U C C Translation Ser Glu Val Transcription
  4. Generation Collection & storage Analytics & computation Collaboration & sharing

    lower cost, increased throughput highly constrained
  5. 2

  6. hi1.4xlarge 2 x 1Tb SSD storage 10 gigabit networking HVM:

    90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
  7. Netflix “The hi1.4xlarge configuration is about half the system cost

    for the same throughput.” http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  8. Amazon Elastic MapReduce Managed Hadoop Clusters Easy to provision and

    monitor Write two functions. Scale up. Choice of Hadoop flavors
  9. Elastic MapReduce Code Name node Output S3 + SimpleDB Input

    data S3 Elastic cluster HDFS Queries + BI Via JDBC, Pig, Hive
  10. “BioSense 2.0 protects the health of the American people by

    providing timely insight into the health of communities, regions, and the nation by o ering a variety of features to improve data collection, standardization, storage, analysis, and collaboration”