Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
99
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
社内でAWS BuilderCards体験会を立ち上げ、得られた気づき / 20260225 Masaki Okuda
shift_evolve
PRO
1
110
EKSで実践する オブザーバビリティの現在地
honmarkhunt
2
300
AI Coding Agentの地殻変動 ~ ai-coding.info の定点観測 ~
kotauchisunsun
0
300
20260222ねこIoTLT ねこIoTLTをふりかえる
poropinai1966
0
210
APMの世界から見るOpenTelemetryのTraceの世界 / OpenTelemetry in the Java
soudai
PRO
0
150
『誰の責任?』で揉めるのをやめて、エラーバジェットで判断するようにした ~感情論をデータで終わらせる、PMとエンジニアの意思決定プロセス~
coconala_engineer
0
1.7k
opsmethod第1回_アラート調査の自動化にむけて
yamatook
0
290
大規模な組織におけるAI Agent活用の促進と課題
lycorptech_jp
PRO
4
5.6k
三菱UFJ銀行におけるエンタープライズAI駆動開発のリアル / Enterprise AI_Driven Development at MUFG Bank: The Real Story
muit
10
18k
バイブコーディングで作ったものを紹介
tatsuya1970
0
180
競争優位を生み出す戦略的内製開発の実践技法
masuda220
PRO
2
430
Goで実現する堅牢なアーキテクチャ:DDD、gRPC-connect、そしてAI協調開発の実践
fujidomoe
3
740
Featured
See All Featured
The Language of Interfaces
destraynor
162
26k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
140
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
140
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
300
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
200
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
450
Java REST API Framework Comparison - PWX 2021
mraible
34
9.2k
The Pragmatic Product Professional
lauravandoore
37
7.2k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
80
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
110
Making the Leap to Tech Lead
cromwellryan
135
9.7k
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
110
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license