Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ivory - Concepts
Search
Ambiata
October 20, 2014
Technology
920
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ivory - Concepts
Ambiata
October 20, 2014
More Decks by Ambiata
See All by Ambiata
Improving feature engineering in the lab and production with Ivory
ambiata
3
680
Ivory - A Data Store for Data Science
ambiata
1
740
Ivory - Data Modelling
ambiata
0
520
Ivory - An Introduction
ambiata
1
1.4k
Other Decks in Technology
See All in Technology
MCP Appsを作ってみよう
iwamot
PRO
4
670
機械学習を「社会実装」するということ 2026年夏版 / Social Implementation of Machine Learning June 2026 Version
moepy_stats
6
2.4k
自宅LLMの話
jacopen
1
600
LayerXにおけるセキュリティ管理の現在地と次の一手
tosho
0
220
2026TECHFRESH畢業分享會 - 葬送的通靈師:化系統與用戶雜訊成行動訊號
line_developers_tw
PRO
0
1.1k
ルールやカスタム機能、どう活かす?ハンズオンで体感するIBM Bobの出力コントロール
muehara
1
170
やさしいA2A入門
minorun365
PRO
12
1.9k
攻撃者視点で考えるDetection Engineering
cryptopeg
3
1.9k
SONiC Scale-Up Working Group から探る Scale-UpやUltraEthernet機能の実装方法
ebiken
PRO
2
350
アジャイルな経理と Claude Code と経営の未来
kawaguti
PRO
3
150
AIエージェントが名古屋の猛暑からあなたを守る
happysamurai294
0
130
【セミナー資料】Claude Code をセキュアに使うための考え方と設定の勘どころ / Claude Code Webinar 20260616
masahirokawahara
2
370
Featured
See All Featured
Reality Check: Gamification 10 Years Later
codingconduct
0
2.2k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
The Cost Of JavaScript in 2023
addyosmani
55
10k
Everyday Curiosity
cassininazir
0
230
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2.1k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
200
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
160
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Transcript
IVORY CONCEPTS http://github.com/ambiata/ivory © Ambiata 2014
IVORY A scalable and extensible data store for storing facts
and extracting features © Ambiata 2014
Ivory Repository Ingest facts Extract features © Ambiata 2014
REPOSITORY • Storing and extracting data for a single class
of entity, e.g.: • customer • account • asset © Ambiata 2014
DATA MODEL © Ambiata 2014
customer-1 balance 634 @ 2014-02-01 single “fact” Fact: Entity -
Attribute - Value - Time The value of a feature (attribute) for a given entity known to be valid from a certain point in time. © Ambiata 2014
customer-1 balance 634 @ 2014-02-01 customer-2 customer-3 customer-4 469 @
2014-02-01 276 @ 2014-04-01 1966 @ 2014-03-01 © Ambiata 2014 scalable
customer-2 customer-3 customer-4 customer-1 gender balance purchases zipcode 634 @
2014-02-01 extensible 469 @ 2014-02-01 276 @ 2014-04-01 1966 @ 2014-03-01 ‘M’ @ 2012-01-01 3 @ 2014-03-27 ‘4670’ @ 2009-05-13 © Ambiata 2014
736 @ 2014-01-01 3 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-04-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 Sparse 469 @ 2014-02-01 © Ambiata 2014
INGESTING FACTS © Ambiata 2014
• Facts are ingested in atomic units called factsets •
Facts in a factset can span any set of: • entities • attributes • dates/times © Ambiata 2014
customer-1 balance 634 2014-02-01 customer-3 balance 184 2014-02-01 customer-4 purchases
4 2014-02-04 cusomter-2 balance 312 2014-03-01 customer-3 gender F 2007-04-01 customer-2 zipcode 3001 2011-03-14 © Ambiata 2014
ATTRIBUTE DICTIONARY © Ambiata 2014
• Any attribute that is ingested must be declared in
the repository’s dictionary • Dictionary stores metadata for each attribute • Updated dictionaries can be imported into a repository at any time © Ambiata 2014
namespace name encoding type description demographics gender string categorical Gender
demographics zipcode string categorical Post-code, zip-code accounts balance double numerical Balance of savings account accounts purchases int numerical Number of credit-card purchases © Ambiata 2014
EXTRACTING FEATURES © Ambiata 2014
© Ambiata 2014 0.00 3 3001 634.83 16 4670 15.12
2 - 33.56 2 - 98.34 12 3303 523.81 23 2046 1086.05 17 - 224.81 9 - 78.21 2 2134 126.48 4 - M - F M F - F F M - gender balance purchases zipcode 89340218 feature instance 48149407 18452274 07499337 62948721 93754723 00272446 13374497 31989993 46474236
SNAPSHOTS • Attribute values for entities at a point in
time • Same time for all entities • Select latest attribute values with respect to that time • Typically used in preparing instances for scoring © Ambiata 2014
736 @ 2014-01-01 6 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-02-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 snapshot @ 2014-03-01 469 @ 2014-02-01 © Ambiata 2014
customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ 312
4 ‘4670’ ‘F’ ‘3001’ 1966 634 469 © Ambiata 2014
• It is assumed snapshots run periodically - e.g. daily,
weekly • Ivory exploits this assumption to improve the runtime of successive snapshots
CHORDS • Attribute values for entities at a point in
time • Different times for different entities • Select latest attribute values with respect to the times • Typically used in preparing instances for training © Ambiata 2014
736 @ 2014-01-01 6 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-04-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 469 @ 2014-02-01 customer2 @ 2014-03-01 customer4 @ 2014-01-01 © Ambiata 2014
customer-2 @ 2014-03-01 customer-4 @ 2014-01-01 gender balance purchases postcode
‘M’ 312 6 ‘4670’ ‘3001’ 1876 © Ambiata 2014
DERIVED FACTS © Ambiata 2014
184 @ 2014-02-01 312 @ 2014-03-01 customer-2 balance max.balance.4M 276
@ 2014-04-01 ? © Ambiata 2014 Maximum balance over last 4 months can be derived from set of balance facts
Many facts can be derived from a time series of
base facts © Ambiata 2014
base fact derived facts balance Maximum balance over the last
month Mean balance over the last 2 months Balance gradient over the last 3 months purchase Number of purchases in the last 3 weeks Proportion of supermarket purchases in the last 2 weeks zipcode Number of times the zipcode has change in the last 5 years Longest period where the zipcode has not changed in the last 5 years © Ambiata 2014
VIRTUAL FEATURES © Ambiata 2014
• Ivory represents derived facts as virtual features • Virtual
features are declared in the dictionary • Specify expressions against base facts • Are computed lazily when features extracted © Ambiata 2014
name source expression window max.balance.4M balance max 4 month mean.balance.6M
balance mean 6 months num.purchases.3W purchase count 3 weeks changes.zipcode.5Y zipcode num_flips 5 years © Ambiata 2014
COMMITS © Ambiata 2014
Ivory Repository Ingest facts Extract features Import dictionary © Ambiata
2014
• A commit is recorded for any repository change: •
factset ingestions • dictionary imports • The repository at a given commit is an immutable data store • Snapshot and chord can be done at a specific commit © Ambiata 2014
1 2 3 4 5 0 create repository import dictionary
ingest factset ingest factset import dictionary ingest factset snapshot snapshot chord © Ambiata 2014
KEY CONCEPTS © Ambiata 2014
• Repository • Commit • Dictionary • Factset • Base
fact • Virtual feature • Snapshot • Chord © Ambiata 2014