Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
92
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
「Chatwork」のEKS環境を支えるhelmfileを使用したマニフェスト管理術
hanayo04
1
430
Building GoReleaser - from shell script to paid product
caarlos0
0
190
AWS Well-Architected から考えるオブザーバビリティの勘所 / Considering the Essentials of Observability from AWS Well-Architected
sms_tech
1
730
20150719_Amazon Nova Canvas Virtual try-onアプリ 作成裏話
riz3f7
0
100
ClaudeCode_vs_GeminiCLI_Terraformで比較してみた
tkikuchi
1
3.4k
“日本一のM&A企業”を支える、少人数SREの効率化戦略 / SRE NEXT 2025
genda
1
300
無理しない AI 活用サービス / #jazug
koudaiii
0
110
ロールが細分化された組織でSREは何をするか?
tgidgd
1
460
三視点LLMによる複数観点レビュー
mhlyc
0
250
20250718_ITSurf_“Bet AI”を支える文化とコストマネジメント
helosshi
0
120
AWS CDK 入門ガイド これだけは知っておきたいヒント集
anank
5
820
Figma Dev Mode MCP Serverを用いたUI開発
zoothezoo
1
270
Featured
See All Featured
RailsConf 2023
tenderlove
30
1.2k
How STYLIGHT went responsive
nonsquared
100
5.6k
Why Our Code Smells
bkeepers
PRO
337
57k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
GraphQLとの向き合い方2022年版
quramy
49
14k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
520
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Git: the NoSQL Database
bkeepers
PRO
431
65k
A better future with KSS
kneath
238
17k
Adopting Sorbet at Scale
ufuk
77
9.5k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license