Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
92
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
トヨタ生産方式(TPS)入門
recruitengineers
PRO
3
240
Goでマークダウンの独自記法を実装する
lag129
0
220
AIエージェントの開発に必須な「コンテキスト・エンジニアリング」とは何か──プロンプト・エンジニアリングとの違いを手がかりに考える
masayamoriofficial
0
400
イオン店舗一覧ページのパフォーマンスチューニング事例 / Performance tuning example for AEON store list page
aeonpeople
2
290
生成AI利用プログラミング:誰でもプログラムが書けると 世の中どうなる?/opencampus202508
okana2ki
0
190
人と組織に偏重したEMへのアンチテーゼ──なぜ、EMに設計力が必要なのか/An antithesis to the overemphasis of people and organizations in EM
dskst
6
620
知られざるprops命名の慣習 アクション編
uhyo
11
2.5k
あなたの知らない OneDrive
murachiakira
0
240
ソフトウェア エンジニアとしての 姿勢と心構え
recruitengineers
PRO
4
760
帳票Vibe Coding
terurou
0
140
モバイルアプリ研修
recruitengineers
PRO
3
260
7月のガバクラ利用料が高かったので調べてみた
techniczna
3
440
Featured
See All Featured
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Documentation Writing (for coders)
carmenintech
73
5k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6k
Visualization
eitanlees
147
16k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Designing Experiences People Love
moore
142
24k
A Tale of Four Properties
chriscoyier
160
23k
Statistics for Hackers
jakevdp
799
220k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Balancing Empowerment & Direction
lara
2
590
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license