Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
200
3
Share
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
120
FGED Keynote
mndoci
3
100
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
270
A Platform for Data Science
mndoci
6
15k
Intel Theater Presentation @ SC11
mndoci
6
200
Talk at West Coast Association of Shared Directors meeting
mndoci
3
160
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
Keeping Ruby Running on Cygwin
fd0
0
170
AWS認定資格は本当に意味があるのか?
nrinetcom
PRO
2
280
自分のハンドルは自分で握れ! ― 自分のケイパビリティを増やし、メンバーのケイパビリティ獲得を支援する ― / Take the wheel yourself
takaking22
1
930
The Journey of Box Building
tagomoris
4
3.1k
ARIA Notifyについて
ryokatsuse
1
120
データを"持てない"環境でのアノテーション基盤設計
sansantech
PRO
1
130
[OAWTT26][THR1028] Oracle AI Database 26ai へのアップグレード:ベストプラクティスと最新情報
oracle4engineer
PRO
1
110
20260428_Product Management Summit_tadokoroyoshiro
tadokoro_yoshiro
4
4.6k
AWS DevOps Agentはチームメイトになれるのか?/ Can AWS DevOps Agent become a teammate
kinunori
6
750
明日からドヤれる!超マニアックなAWSセキュリティTips10連発 / 10 Ultra-Niche AWS Security Tips
yuj1osm
0
600
Hacobu Tech Deck
hacobu
PRO
0
120
EBS暗号化に失敗してEC2が動かなくなった話
hamaguchimmm
2
210
Featured
See All Featured
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
680
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
530
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
270
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
350
The SEO identity crisis: Don't let AI make you average
varn
0
450
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
250
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
150
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
270
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.8k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.6k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license