Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
97
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
1
170
データの整合性を保ちたいだけなんだ
shoheimitani
7
2.8k
What happened to RubyGems and what can we learn?
mikemcquaid
0
240
Azure Durable Functions で作った NL2SQL Agent の精度向上に取り組んだ話/jat08
thara0402
0
140
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
2
530
入社1ヶ月でデータパイプライン講座を作った話
waiwai2111
1
220
We Built for Predictability; The Workloads Didn’t Care
stahnma
0
130
顧客との商談議事録をみんなで読んで顧客解像度を上げよう
shibayu36
0
150
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
390
Amazon Bedrock AgentCore 認証・認可入門
hironobuiga
2
500
toCプロダクトにおけるAI機能開発のしくじりと学び / ai-product-failures-and-learnings
rince
6
5.5k
Databricks Free Edition講座 データサイエンス編
taka_aki
0
290
Featured
See All Featured
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
310
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
The Cult of Friendly URLs
andyhume
79
6.8k
Building an army of robots
kneath
306
46k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.9k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
52
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
0
250
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
660
Designing Powerful Visuals for Engaging Learning
tmiket
0
210
Producing Creativity
orderedlist
PRO
348
40k
Site-Speed That Sticks
csswizardry
13
1.1k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license