$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
96
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
M&Aで拡大し続けるGENDAのデータ活用を促すためのDatabricks権限管理 / AEON TECH HUB #22
genda
0
230
AWSの新機能をフル活用した「re:Inventエージェント」開発秘話
minorun365
2
430
20251203_AIxIoTビジネス共創ラボ_第4回勉強会_BP山崎.pdf
iotcomjpadmin
0
130
テストセンター受験、オンライン受験、どっちなんだい?
yama3133
0
130
AI駆動開発ライフサイクル(AI-DLC)の始め方
ryansbcho79
0
140
AWSインフルエンサーへの道 / load of AWS Influencer
whisaiyo
0
210
TED_modeki_共創ラボ_20251203.pdf
iotcomjpadmin
0
140
Next.js 16の新機能 Cache Components について
sutetotanuki
0
170
20251218_AIを活用した開発生産性向上の全社的な取り組みの進め方について / How to proceed with company-wide initiatives to improve development productivity using AI
yayoi_dd
0
650
2025-12-18_AI駆動開発推進プロジェクト運営について / AIDD-Promotion project management
yayoi_dd
0
150
JEDAI認定プログラム JEDAI Order 2026 エントリーのご案内 / JEDAI Order 2026 Entry
databricksjapan
0
180
20251222_サンフランシスコサバイバル術
ponponmikankan
2
140
Featured
See All Featured
30 Presentation Tips
portentint
PRO
1
170
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
170
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.2k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Mobile First: as difficult as doing things right
swwweet
225
10k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
29
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
66
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license