Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ivory - Concepts
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Ambiata
October 20, 2014
Technology
920
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ivory - Concepts
Ambiata
October 20, 2014
More Decks by Ambiata
See All by Ambiata
Improving feature engineering in the lab and production with Ivory
ambiata
3
680
Ivory - A Data Store for Data Science
ambiata
1
740
Ivory - Data Modelling
ambiata
0
520
Ivory - An Introduction
ambiata
1
1.4k
Other Decks in Technology
See All in Technology
LLMにもCAP定理があるという話
harukasakihara
0
390
SONiCで構築・運用する生成AI向けパブリッククラウドネットワーク ~実装編~
sonic
0
230
AIの性能が向上しても未解決な組織の重大問題は何か?/An Unsolved Organizational Problem in the Age of AI
moriyuya
4
690
AIはどのように 組織のアジリティを変えるのか?
junki
4
960
AGENTS.mdとSkillsで始めるAIエージェント活用
sonoda_mj
3
220
【Cyber-sec+】経営層を"動かす"ための考え方
hssh2_bin
0
190
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
180
現地で盛り上がった WWDC26 Keynote
zozotech
PRO
1
250
On-behalf-of Token exchange with AgentCore Identity
hironobuiga
2
230
スキルと MCP ツール、責務をどう分けるか? AI が迷わないインターフェース設計の戦略
cdataj
1
1.1k
気づかぬうちにセキュリティ負債を生むAPIキー運用
sgwrmctk
0
160
データサイエンスを価値につなげるプロジェクト設計 〜 DS一年目が現場で得た気づき 〜
ysd113
1
260
Featured
See All Featured
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
330
Code Reviewing Like a Champion
maltzj
528
40k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
HTML-Aware ERB: The Path to Reactive Rendering @ RubyCon 2026, Rimini, Italy
marcoroth
1
200
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
390
The browser strikes back
jonoalderson
0
1.2k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
160
Technical Leadership for Architectural Decision Making
baasie
3
410
Designing for humans not robots
tammielis
254
26k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
340
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
56k
Transcript
IVORY CONCEPTS http://github.com/ambiata/ivory © Ambiata 2014
IVORY A scalable and extensible data store for storing facts
and extracting features © Ambiata 2014
Ivory Repository Ingest facts Extract features © Ambiata 2014
REPOSITORY • Storing and extracting data for a single class
of entity, e.g.: • customer • account • asset © Ambiata 2014
DATA MODEL © Ambiata 2014
customer-1 balance 634 @ 2014-02-01 single “fact” Fact: Entity -
Attribute - Value - Time The value of a feature (attribute) for a given entity known to be valid from a certain point in time. © Ambiata 2014
customer-1 balance 634 @ 2014-02-01 customer-2 customer-3 customer-4 469 @
2014-02-01 276 @ 2014-04-01 1966 @ 2014-03-01 © Ambiata 2014 scalable
customer-2 customer-3 customer-4 customer-1 gender balance purchases zipcode 634 @
2014-02-01 extensible 469 @ 2014-02-01 276 @ 2014-04-01 1966 @ 2014-03-01 ‘M’ @ 2012-01-01 3 @ 2014-03-27 ‘4670’ @ 2009-05-13 © Ambiata 2014
736 @ 2014-01-01 3 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-04-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 Sparse 469 @ 2014-02-01 © Ambiata 2014
INGESTING FACTS © Ambiata 2014
• Facts are ingested in atomic units called factsets •
Facts in a factset can span any set of: • entities • attributes • dates/times © Ambiata 2014
customer-1 balance 634 2014-02-01 customer-3 balance 184 2014-02-01 customer-4 purchases
4 2014-02-04 cusomter-2 balance 312 2014-03-01 customer-3 gender F 2007-04-01 customer-2 zipcode 3001 2011-03-14 © Ambiata 2014
ATTRIBUTE DICTIONARY © Ambiata 2014
• Any attribute that is ingested must be declared in
the repository’s dictionary • Dictionary stores metadata for each attribute • Updated dictionaries can be imported into a repository at any time © Ambiata 2014
namespace name encoding type description demographics gender string categorical Gender
demographics zipcode string categorical Post-code, zip-code accounts balance double numerical Balance of savings account accounts purchases int numerical Number of credit-card purchases © Ambiata 2014
EXTRACTING FEATURES © Ambiata 2014
© Ambiata 2014 0.00 3 3001 634.83 16 4670 15.12
2 - 33.56 2 - 98.34 12 3303 523.81 23 2046 1086.05 17 - 224.81 9 - 78.21 2 2134 126.48 4 - M - F M F - F F M - gender balance purchases zipcode 89340218 feature instance 48149407 18452274 07499337 62948721 93754723 00272446 13374497 31989993 46474236
SNAPSHOTS • Attribute values for entities at a point in
time • Same time for all entities • Select latest attribute values with respect to that time • Typically used in preparing instances for scoring © Ambiata 2014
736 @ 2014-01-01 6 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-02-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 snapshot @ 2014-03-01 469 @ 2014-02-01 © Ambiata 2014
customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ 312
4 ‘4670’ ‘F’ ‘3001’ 1966 634 469 © Ambiata 2014
• It is assumed snapshots run periodically - e.g. daily,
weekly • Ivory exploits this assumption to improve the runtime of successive snapshots
CHORDS • Attribute values for entities at a point in
time • Different times for different entities • Select latest attribute values with respect to the times • Typically used in preparing instances for training © Ambiata 2014
736 @ 2014-01-01 6 @ 2014-02-19 184 @ 2014-02-01 312
@ 2014-03-01 customer-1 customer-2 customer-3 customer-4 gender balance purchases zipcode ‘M’ @ 2012-01-01 276 @ 2014-04-01 4 @ 2014-04-04 2 @ 2014-03-12 3 @ 2014-03-27 ‘2381’ @ 2004-08-19 ‘4670’ @ 2009-05-13 ‘F’ @ 2007-04-01 ‘3001’ @ 2011-09-14 1876 @ 2014-02-01 1966 @ 2014-03-01 634 @ 2014-02-01 469 @ 2014-02-01 customer2 @ 2014-03-01 customer4 @ 2014-01-01 © Ambiata 2014
customer-2 @ 2014-03-01 customer-4 @ 2014-01-01 gender balance purchases postcode
‘M’ 312 6 ‘4670’ ‘3001’ 1876 © Ambiata 2014
DERIVED FACTS © Ambiata 2014
184 @ 2014-02-01 312 @ 2014-03-01 customer-2 balance max.balance.4M 276
@ 2014-04-01 ? © Ambiata 2014 Maximum balance over last 4 months can be derived from set of balance facts
Many facts can be derived from a time series of
base facts © Ambiata 2014
base fact derived facts balance Maximum balance over the last
month Mean balance over the last 2 months Balance gradient over the last 3 months purchase Number of purchases in the last 3 weeks Proportion of supermarket purchases in the last 2 weeks zipcode Number of times the zipcode has change in the last 5 years Longest period where the zipcode has not changed in the last 5 years © Ambiata 2014
VIRTUAL FEATURES © Ambiata 2014
• Ivory represents derived facts as virtual features • Virtual
features are declared in the dictionary • Specify expressions against base facts • Are computed lazily when features extracted © Ambiata 2014
name source expression window max.balance.4M balance max 4 month mean.balance.6M
balance mean 6 months num.purchases.3W purchase count 3 weeks changes.zipcode.5Y zipcode num_flips 5 years © Ambiata 2014
COMMITS © Ambiata 2014
Ivory Repository Ingest facts Extract features Import dictionary © Ambiata
2014
• A commit is recorded for any repository change: •
factset ingestions • dictionary imports • The repository at a given commit is an immutable data store • Snapshot and chord can be done at a specific commit © Ambiata 2014
1 2 3 4 5 0 create repository import dictionary
ingest factset ingest factset import dictionary ingest factset snapshot snapshot chord © Ambiata 2014
KEY CONCEPTS © Ambiata 2014
• Repository • Commit • Dictionary • Factset • Base
fact • Virtual feature • Snapshot • Chord © Ambiata 2014