Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ivory - A Data Store for Data Science
Search
Ambiata
October 20, 2014
Technology
740
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ivory - A Data Store for Data Science
Ambiata
October 20, 2014
More Decks by Ambiata
See All by Ambiata
Improving feature engineering in the lab and production with Ivory
ambiata
3
680
Ivory - Concepts
ambiata
0
920
Ivory - Data Modelling
ambiata
0
520
Ivory - An Introduction
ambiata
1
1.4k
Other Decks in Technology
See All in Technology
SONiCの統計情報を取得したい
sonic
0
180
中期計画、2回作ってみた ~業務委託と正社員、両方の視点から~
demaecan
1
910
Kiroで書いた 設計書 が AI レビューの 採点基準 になる
ezaki
0
120
LayerXにおけるセキュリティ管理の現在地と次の一手
tosho
0
220
脆弱性対応、どこで線を引くか
rymiyamoto
1
400
On-behalf-of Token exchange with AgentCore Identity
hironobuiga
2
230
なぜ Platform Engineering の土台に Kubernetes を選ぶのか
r4ynode
2
650
Claude Code の Sandbox 機能を Anthropic Sandbox Runtime(srt) で試そう!/lets-play-anthropic-sandbox-runtime
tomoki10
1
620
人材育成分科会.pdf
_awache
4
270
【Snowflake Summit 2026 Recap!!】Snowflake Summit Deep Dive: Security & Governance
civitaspo
1
230
SONiCのLinuxベースを活かしたZabbix監視
sonic
0
180
AIのReact習熟度を測る
uhyo
2
620
Featured
See All Featured
Marketing to machines
jonoalderson
1
5.5k
Test your architecture with Archunit
thirion
1
2.3k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
390
The browser strikes back
jonoalderson
0
1.2k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.2k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
330
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
160
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
300
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
250
Transcript
IVORY A DATA STORE FOR DATA SCIENCE http://github.com/ambiata/ivory © Ambiata
2014
DATA SCIENCE IN THE REAL WORLD © Ambiata 2014
PROBLEM #1 © Ambiata 2014
“DATA WRANGLING” © Ambiata 2014
WHAT WE START WITH © Ambiata 2014
© Ambiata 2014
WHAT WE NEED © Ambiata 2014
Feature vectors © Ambiata 2014 0.00 3 3001 1.00 634.83
16 4670 0.6875 15.12 2 - 0.50 33.56 2 - 1.00 98.34 12 3303 0.8333 523.81 23 2046 0.4782 1086.05 17 - 1.00 224.81 9 - 0.2222 78.21 2 2134 0.50 126.48 4 - 0.0 1 3 1 1 4 1 2 1 1 1 M - F M F - F F M - gender balance purchases zipcode prop_online num_accs 89340218 feature instance 48149407 18452274 07499337 62948721 93754723 00272446 13374497 31989993 46474236
Data set B Data set C Data set D Feature
Eng Model train Score A typical workflow Multiple data sources: • Transaction logs • Database snapshots • Segmentation models • 3rd-party data © Ambiata 2014 Feature engineering: • Data source are joined • Instances are created • Features are engineered The cool stuff: • Models are built • Instances are scored
Feature preparation Modelling 85% 15% © Ambiata 2014
Data set A Data set B Data set C Data
set D Data set E Feature Eng 1 Feature Eng 2 Feature Eng 3 Train 1 Score 1 Train 2 Score 2 Train 3 Score 3 Multiple data science projects Feature engineering is in a silo - no reuse between model builds © Ambiata 2014
PROBLEM #2 © Ambiata 2014
“LAB TO FACTORY” AKA DEV OPS © Ambiata 2014
• Continually receiving data • Want to leverage a history
of all this data • Continually training + scoring • Data may need to be corrected • Need to extend data model on-the-fly © Ambiata 2014
LAMBDA ARCHITECTURE © Ambiata 2014
© Ambiata 2014 query = function(all data)
© Ambiata 2014 New data stream Query Magical query engine
© Ambiata 2014 SERVING LAYER New data stream Query All
data Precomputed views Stream processing Incremental views BATCH LAYER SPEED LAYER Real-time store
© Ambiata 2014 New data stream Query All data Feature
view Stream processing Incremental views Real-time store
© Ambiata 2014 New data stream Query All data Feature
view Stream processing Incremental views Real-time store Model train and score
IVORY © Ambiata 2014
Data set A Data set B Data set C Data
set D Data set E Train 1 Score 1 Train 2 Score 2 Train 3 Score 3 Ivory © Ambiata 2014 A shared feature view asset
© Ambiata 2014 New data stream Query All data Ivory
Stream processing Incremental views Real-time store Model train and score An immutable, batch-oriented data store
© Ambiata 2014 Feature vectors Ivory An extensible data model,
backed by HDFS/S3 HDFS / S3
Apache V2 Licence github.com/ambiata/ivory © Ambiata 2014