Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
210
3
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
130
Platforms for scientific data analysis
mndoci
3
120
FGED Keynote
mndoci
3
110
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
270
A Platform for Data Science
mndoci
6
15k
Intel Theater Presentation @ SC11
mndoci
6
210
Talk at West Coast Association of Shared Directors meeting
mndoci
3
160
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
130
Other Decks in Technology
See All in Technology
フロンティアAIのゲート化と地政学リスク
nagatsu
0
130
【NRUG vol.18】なぜ多くのオブザーバビリティ導入は失敗するのか
nrug_member
0
120
NAB Show 2026 動画技術関連レポート / NAB Show 2026 Report
cyberagentdevelopers
PRO
0
190
SONiC Scale-Up Working Group から探る Scale-UpやUltraEthernet機能の実装方法
ebiken
PRO
2
290
SONiCのLinuxベースを活かしたZabbix監視
sonic
0
140
チームで進めるAI駆動アジャイル×ウォーターフォール
kumaiu
0
160
Socrates × Looker 〜セマンティックレイヤーで進化するデータ分析エージェント〜
hanon52_
3
2.3k
「エンジニア進化論」2028年の開発完全自動化、エンジニアはどう進化するか
cyberagentdevelopers
PRO
6
5k
10倍の生産性を実現するAI駆動並列エージェントのすべて
kumaiu
5
1.4k
就職⽀援サービスにおけるキャリアアドバイザーのシフトスケジューリング
recruitengineers
PRO
1
140
AIネイティブな開発のサプライチェーンリスク対策 〜激動の開発現場でリスクに立ち向かう〜【ZennFes】
cscengineer
PRO
2
120
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
650
Featured
See All Featured
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
300
Imperfection Machines: The Place of Print at Facebook
scottboms
270
14k
Believing is Seeing
oripsolob
1
140
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.5k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
190
RailsConf 2023
tenderlove
30
1.5k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
200
Google's AI Overviews - The New Search
badams
0
1k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
250
The Language of Interfaces
destraynor
162
27k
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license