Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
99
Platforms for scientific data analysis
mndoci
3
90
FGED Keynote
mndoci
3
89
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
こんなデータマートは嫌だ。どんな? / waiwai-data-meetup-202504
shuntak
1
230
「ラベルにとらわれない」エンジニアでいること/Be an engineer beyond labels
kaonavi
0
220
17年のQA経験が導いたスクラムマスターへの道 / 17 Years in QA to Scrum Master
toma_sm
0
490
DevOps文化を育むQA 〜カルチャーバブルを生み出す戦略〜 / 20250317 Atsushi Funahashi
shift_evolve
1
120
開発現場とセキュリティ担当をつなぐ脅威モデリング
cloudace
0
130
ウェブアクセシビリティとは
lycorptech_jp
PRO
0
340
チームの性質によって変わる ADR との向き合い方と、生成 AI 時代のこれから / How to deal with ADR depends on the characteristics of the team
mh4gf
4
360
ペアプログラミングにQAが加わった!職能を超えたモブプログラミングの事例と学び
tonionagauzzi
1
150
数百台のオンプレミスのサーバーをEKSに移行した話
yukiteraoka
0
770
AWS CDK コントリビュート はじめの一歩
yendoooo
1
130
React Server Componentは 何を解決し何を解決しないのか / What do React Server Components solve, and what do they not solve?
kaminashi
6
1.3k
ルートユーザーの活用と管理を徹底的に深掘る
yuobayashi
8
740
Featured
See All Featured
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.8k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Optimising Largest Contentful Paint
csswizardry
35
3.2k
How GitHub (no longer) Works
holman
314
140k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
227
22k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.5k
Agile that works and the tools we love
rasmusluckow
328
21k
Six Lessons from altMBA
skipperchong
27
3.7k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
2.9k
Site-Speed That Sticks
csswizardry
4
460
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license