Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
93
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
様々なファイルシステム
sat
PRO
0
240
OTEPsで知るOpenTelemetryの未来 / Observability Conference Tokyo 2025
arthur1
0
240
AI時代におけるデータの重要性 ~データマネジメントの第一歩~
ryoichi_ota
0
710
ゼロコード計装導入後のカスタム計装でさらに可観測性を高めよう
sansantech
PRO
1
400
現場の壁を乗り越えて、 「計装注入」が拓く オブザーバビリティ / Beyond the Field Barriers: Instrumentation Injection and the Future of Observability
aoto
PRO
1
580
入院医療費算定業務をAIで支援する:包括医療費支払い制度とDPCコーディング (公開版)
hagino3000
0
110
混合雲環境整合異質工作流程工具運行關鍵業務 Job 的經驗分享
yaosiang
0
180
CREが作る自己解決サイクルSlackワークフローに組み込んだAIによる社内ヘルプデスク改革 #cre_meetup
bengo4com
0
330
現場データから見える、開発生産性の変化コード生成AI導入・運用のリアル〜 / Changes in Development Productivity and Operational Challenges Following the Introduction of Code Generation AI
nttcom
1
480
デザインとエンジニアリングの架け橋を目指す OPTiMのデザインシステム「nucleus」の軌跡と広げ方
optim
0
110
ハノーファーメッセ2025で見た生成AI活用ユースケース.pdf
hamadakoji
1
460
ヘンリー会社紹介資料(エンジニア向け) / company deck for engineer
henryofficial
0
380
Featured
See All Featured
Thoughts on Productivity
jonyablonski
70
4.9k
Six Lessons from altMBA
skipperchong
29
4k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3k
GitHub's CSS Performance
jonrohan
1032
470k
YesSQL, Process and Tooling at Scale
rocio
173
15k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Building an army of robots
kneath
305
46k
Site-Speed That Sticks
csswizardry
13
920
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Embracing the Ebb and Flow
colly
88
4.9k
The Cost Of JavaScript in 2023
addyosmani
55
9.1k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license