Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
92
FGED Keynote
mndoci
3
89
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
CloudBruteによる外部からのS3バケットの探索・公開の発見について / 20250605 Kumiko Hennmi
shift_evolve
3
160
うちの会社の評判は?SNSの投稿分析にAIを使ってみた
doumae
0
160
Cursor Meetup Tokyo
iamshunta
2
470
データプレーンプログラミングとは? DPU&スイッチASICの開発経験から語る
ebiken
PRO
1
260
OSMnx Galleryの紹介
mopinfish
0
150
令和最新版TypeScriptでのnpmパッケージ開発
lycorptech_jp
PRO
0
110
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
8
65k
Devin&Cursor、それぞれの「本質」から導く最適ユースケース戦略
empitsu
8
2.4k
いまさら聞けない Git 超入門 〜Gitって結局なに?から始める第一歩〜
devops_vtj
0
160
What's Next in OpenShift Q2 CY2025
redhatlivestreaming
1
810
プラットフォームとしての Datadog / Datadog as Platforms
aoto
PRO
1
340
mnt_data_とは?ChatGPTコード実行環境を深堀りしてみた
icck
0
210
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
49
8.1k
Rails Girls Zürich Keynote
gr2m
94
13k
Faster Mobile Websites
deanohume
307
31k
Raft: Consensus for Rubyists
vanstee
137
7k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
5
620
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
Rebuilding a faster, lazier Slack
samanthasiow
81
9k
It's Worth the Effort
3n
184
28k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.6k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Code Review Best Practice
trishagee
68
18k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license