Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
93
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
「技術負債にならない・間違えない」 権限管理の設計と実装
naro143
31
9.5k
AI Agentと MCP Serverで実現する iOSアプリの 自動テスト作成の効率化
spiderplus_cb
0
290
組織観点からIAM Identity CenterとIAMの設計を考える
nrinetcom
PRO
1
130
LLMアプリケーション開発におけるセキュリティリスクと対策 / LLM Application Security
flatt_security
7
1.5k
Go Conference 2025: GoのinterfaceとGenericsの内部構造と進化 / Go type system internals
ryokotmng
3
540
そのグラフに「魂」は宿っているか? ~生成AI全盛期におけるデータ可視化手法とライブラリ比較~
negi111111
2
830
WebアプリケーションのUI構築で気を付けてるポイント
tomokusaba
0
180
PyCon JP 2025 DAY1 「Hello, satellite data! ~Pythonではじめる衛星データ解析~」
ra0kley
0
730
GopherCon Tour 概略
logica0419
2
160
Streamlit は社内ツールだけじゃない!PoC の速さで実現する'商用品質'の分析 SaaS アーキテクチャ
kdash
2
1k
Sidekiq その前に:Webアプリケーションにおける非同期ジョブ設計原則
morihirok
17
6.2k
kaigi_on_rails_2025_設計.pdf
nay3
8
4k
Featured
See All Featured
Designing Experiences People Love
moore
142
24k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6.1k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
20k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
188
55k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Agile that works and the tools we love
rasmusluckow
330
21k
The Language of Interfaces
destraynor
162
25k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
For a Future-Friendly Web
brad_frost
180
9.9k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
45
2.5k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license