Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data-driven Innovation
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Matt Wood
October 10, 2012
Technology
430
1
Share
Data-driven Innovation
Slides from my session at the #AWS Public Sector Summit, 2012.
Matt Wood
October 10, 2012
More Decks by Matt Wood
See All by Matt Wood
Field Notes from Expeditions in the Cloud
mza
2
480
A Platform for Big Data
mza
6
840
The Data Lifecycle
mza
5
580
Provision Throughput Like a Boss
mza
0
510
Impact of Cloud Computing: Life Sciences
mza
2
920
Latency's Worst Nightmare: Performance Tuning Tips and Tricks
mza
4
1.2k
Under the Covers of DynamoDB
mza
4
1.2k
From Analytics to Intelligence: Amazon Redshift
mza
9
1.1k
Scaling Science
mza
3
570
Other Decks in Technology
See All in Technology
「強制アップデート」か「チームの自律」か?エンタープライズが辿り着いたプラットフォームのハイブリッド運用/cloudnative-kaigi-hybrid-platform-operations
mhrtech
0
190
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
1.6k
Claude Codeウェビナー資料 - AWSの最新機能をClaude Codeで高速に検証する
oshanqq
0
240
AI-Assisted Contributions and Maintainer Load - PyCon US 2026
pauloxnet
1
120
(きっとたぶん)人材育成や教育のような何かの話
sejima
0
720
大学職員のための生成AI最前線 :最前線を、AIガバナンスとして読み直すためのTips
gmoriki
2
4k
freeeで運用しているAIQAについて
qatonchan
0
560
ボトムアップ限界を越える - 20チームを束る "Drive Map" / Beyond Bottom-Up: A 'Drive Map' for 20 Teams
kaonavi
0
190
AIエージェントの支払い基盤 AgentCore Payments概要
kmiya84377
2
170
Claude Code / Codex / Kiro に AWS 権限を 渡すとき、何を設計すべきか
k_adachi_01
5
1.2k
試作とデモンストレーション / Prototyping and Demonstrations
ks91
PRO
0
200
Databricks 月刊サービスアップデートまとめ 2026年04月号
tyosi1212
0
120
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.5k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Raft: Consensus for Rubyists
vanstee
141
7.4k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
530
The World Runs on Bad Software
bkeepers
PRO
72
12k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
330
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
550
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
320
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
800
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
420
Transcript
Data-driven innovation
[email protected]
Dr. Matt Wood @mza
Hello
Hello
Data
DNA
Chromosome 11 : ACTN3 : rs1815739
Chromosome X : rs6625163
Chromosome 19 : FUT2 : rs601338
Chromosome 2 : rs10427255
TYPE II Chromosome 10 : rs7903146
+0.25 Chromosome 15 : rs2472297
I know this, because...
None
A T C G G T C C A G
G
A T C G G T C C A G
G A G C C A G G U C C Transcription
A T C G G T C C A G
G A G C C A G G U C C Translation Ser Glu Val Transcription
None
None
Chromosome 11 : ACTN3 : rs1815739
Chromosome X : rs6625163
Chromosome 19 : FUT2 : rs601338
Chromosome 2 : rs10427255
TYPE II Chromosome 10 : rs7903146
+0.25 Chromosome 15 : rs2472297
I know all that, because...
Human Genome Project
40 species ensembl.org
Compare
Change
Less
None
None
Compare
Transformative
None
Data generation costs are falling everywhere
Customer segmentation, financial modeling, system analysis, line of sight, business
intelligence.
Opportunity
Transformation
Innovation
Generation Collection & storage Analytics & computation Collaboration & sharing
Generation Collection & storage Analytics & computation Collaboration & sharing
lower cost, increased throughput
Generation Collection & storage Analytics & computation Collaboration & sharing
lower cost, increased throughput highly constrained
Barrier
Data generation challenge X
Analytics challenge
Accessibility challenge
Enter the AWS Cloud
Utility
Remove constraints
Data-driven innovation
Distributed
2
2 Software for distributed storage & analysis
2 Software for distributed storage & analysis Infrastructure for distributed
storage & analysis
Software Frameworks for data-intensive work loads. Distributed by design.
Infrastructure Platform for data-intensive work loads. Distributed by design.
Support the data timeline
Generation Collection & storage Analytics & computation Collaboration & sharing
highly constrained
Generation Collection & storage Analytics & computation Collaboration & sharing
Lower the barrier to entry
Agility
Responsive
Generation Collection & storage Analytics & computation Collaboration & sharing
Generation DynamoDB Analytics & computation Collaboration & sharing
Generation DynamoDB EC2, Elastic MapReduce Collaboration & sharing
Generation DynamoDB EC2, Elastic MapReduce S3, Public Datasets
Tools and techniques for working productively with data
Scale
Secure
2 Software for distributed storage & analysis Infrastructure for distributed
storage & analysis
Amazon EC2
Scale out systems Embarrassingly parallel Queue based distribution Small, medium
and high scale
High performance
High performance Compute performance
Cluster Compute Intel Xeon E5-2670 10 gigabit, non-blocking network 60.5
Gb Placement groupings
Cluster Compute Intel Xeon E5-2670 10 gigabit, non-blocking network 60.5
Gb Placement groupings +GPU
240 TFLOPS
High performance Compute performance IO performance
Unstructured
Variable
Amazon DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond
latencies No schema. Zero admin.
...and SSDs for all
hi1.4xlarge 2 x 1Tb SSD storage 10 gigabit networking HVM:
90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
Netflix “The hi1.4xlarge configuration is about half the system cost
for the same throughput.” http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Provisioned IOPS Provision required IO performance EBS optimized instances
Cost optimization
Reserved capacity
Reserved capacity On-demand
Reserved capacity On-demand
Spot instances
None
$0.2530 vs $2.40
2 Software for distributed storage & analysis Infrastructure for distributed
storage & analysis
map/reduce
Map. Reduce.
Write functions. Scale up.
Hadoop
Undi erentiated heavy lifting
Amazon Elastic MapReduce Managed Hadoop Clusters Easy to provision and
monitor Write two functions. Scale up. Choice of Hadoop flavors
Amazon Elastic MapReduce Integrates with S3 Analytics for DynamoDB Perfect
for Spot pricing
Input data S3
Elastic MapReduce Code Input data S3
Elastic MapReduce Code Name node Input data S3
Elastic MapReduce Code Name node Input data S3 Elastic cluster
Elastic MapReduce Code Name node Input data S3 Elastic cluster
HDFS
Elastic MapReduce Code Name node Input data S3 Elastic cluster
HDFS Queries + BI Via JDBC, Pig, Hive
Elastic MapReduce Code Name node Output S3 + SimpleDB Input
data S3 Elastic cluster HDFS Queries + BI Via JDBC, Pig, Hive
Output S3 + SimpleDB Input data S3
CDC Centers for Disease Control and Prevention
“BioSense 2.0 protects the health of the American people by
providing timely insight into the health of communities, regions, and the nation by o ering a variety of features to improve data collection, standardization, storage, analysis, and collaboration”
Health data Collection & storage Analytics & computation Collaboration &
sharing
Health data Collection & storage Analytics & computation Collaboration &
sharing highly constrained
HIPAA, HITECH, FISMA Moderate
GovCloud
Beyond a definition of Big Data
Chromosome 11 : ACTN3 : rs1815739
Chromosome X : rs6625163
Chromosome 19 : FUT2 : rs601338
Chromosome 2 : rs10427255
TYPE II Chromosome 10 : rs7903146
+0.25 Chromosome 15 : rs2472297
Thank you aws.amazon.com @mza
[email protected]