Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
210
3
Share
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
120
FGED Keynote
mndoci
3
100
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
270
A Platform for Data Science
mndoci
6
15k
Intel Theater Presentation @ SC11
mndoci
6
200
Talk at West Coast Association of Shared Directors meeting
mndoci
3
160
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
130
Other Decks in Technology
See All in Technology
Purview Endpoint DLP 動かしてみた
kozakigh
0
420
AI-Assisted Contributions and Maintainer Load - PyCon US 2026
pauloxnet
1
150
オライリーイベント登壇資料「鉄リサイクル・産廃業界におけるAI技術実応用のカタチ」
takarasawa_
0
410
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
100k
Claude Codeウェビナー資料 - AWSの最新機能をClaude Codeで高速に検証する
oshanqq
0
810
20260515 ログイン機能だけではないアカウント管理を全体で考える~サービス設計者向け~
oidfj
0
650
LookerとADKで作る社内AIエージェント
chanyou0311
0
230
2026-05-14 要件定義からソース管理まで!IBM Bob基礎ハンズオン
yutanonaka
0
160
なぜ、私がCommunity Builderに?〜活動期間1か月半でも選出されたワケ〜
yama3133
0
130
セキュリティ対策、何からはじめる? CloudNative環境の脅威モデリングと リスク評価実践入門 #cloudnativekaigi
varu3
5
940
RedmineをAIで効率的に使う検証
yoshiokacb
0
120
AIを賢くしたいなら、まずは人間の改善ループから
subroh0508
0
120
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
690
Designing for Performance
lara
611
70k
The Cult of Friendly URLs
andyhume
79
6.9k
Crafting Experiences
bethany
1
140
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Context Engineering - Making Every Token Count
addyosmani
9
880
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
180
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
Odyssey Design
rkendrick25
PRO
2
620
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license