[2023] Complexities You Should Care about Doing Real-time ML

Invisible Interfaces Zhenzhong Xu Cofounder & CTO @ claypot.ai July,
2023 Considerations for Abstracting Complexities of a Real-time ML Platform

The discovery of something invisible

The Invisible Interface Ubiquitous Easy and responsive Just works! The
endeavor to make things useful

Real-time Decisions that powers your business Fraud prevention Personalization Customer
support Dynamic pricing/discounting Trending products Risk Assessment Account Take Over Ads ETA Network analysis Sentiment analysis Object detection …

The world is moving towards real-time • Instacart: The Journey
to Real-Time Machine Learning (2022) ◦ Directly reduces millions of fraud-related costs annually. • LinkedIn’s Real-time Anti-abuse (2022) ◦ LinkedIn moved from an offline pipeline (hours) to real-time pipeline (minutes), and saw 30% increase in bad actors caught online and 21% improvement in fake account detection. • How WhatsApp catches and fights abuse (2022 | slides) ◦ A few 100ms delay can increase the spam by 20-30%. • How Pinterest Leverages Realtime User Actions in Recommendation to Boost Engagement (2022) ◦ According to Pinterest, this “has been one of our most impactful innovations recently, increasing Home feed engagement by 11% while reducing Pinner hide volume by 10%.” • Airbnb: Real-time Personalization using Embeddings for Search Ranking (2018) ◦ Moving from offline scoring to online scoring grows bookings by +5.1% 5

Real-time Decisions Data Fabric for Real-time AI Data Infrastructure Exploration
& Research Model Architecture & Turning Model Analysis & Selection Ingestion & Transport Security & Governance Multi-tenancy Isolation Data Sources Storage Query & Compute LLM Prompt Engineering Workﬂow Orchestration Analytics / Visualization

Model Serving Model Training Model Monitoring Model Evaluation Prediction Input
Training Input Data Monitoring Data Model Flow Data Flow Product Ecosystem Analytics ecosystem

The hard things towards real-time decisions • Data silo and
staleness • Collaboration overhead • Tech complexity

Challenge 1: From Experimentation to Production • Slow prototyping •
Local vs. remote execution • Divergent language & runtime

Local Experimentation with Traditional Models

Local Experimentation with LLMs

Sources Feature store online + ofﬂine Prediction service Feature API
Create, experiment, & deploy features Computation engines Training service Feature catalog Data scientists Central repo

Local/Single Machine Remote/Distributed Need an invisible interface to plug into
compute ecosystems

Declare features with familiar APIs @transformation def average_transaction_amount_by_merchant( tx: Transactions,
wspec: WindowSpec): return tx.groupby(["cc_num", "merchant"])["amt"].window(wspec).mean()

17 Workload Compiler / Optimizer Deployment Relational Expression @transformation def
transaction_count(tx: Transactions, wspec: WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Data Science Friendly: Python <> SQL

Workload Compiler/Optimizer Deployment Relational Expression @transformation def transaction_count(tx: Transactions, wspec:
WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Same code can run on different computation engines Compile into a relational expression (RE), which is SQL equivalent Intermediate Representation Compile & optimize RE into the computation engine (e.g., Panda, DuckDb, Flink, Spark) best suited for the job Spin up and manage computation jobs

Solution 1: Relational Expression based Compilation • Unified yet familiar
API • Pluggable to many compute engines • Minimize human error • Prototype in minutes

Challenge 2: Streaming and Batch Divided • Evolving architecture •
Difficult to backfill • Train-predict inconsistencies

Data Source In-motion Compute At-rest Compute Online Storage Offline Storage
Online Query (serving) Mixed Query (backfill) Offline Query (training) Lambda Architecture

Kappa (Streaming) Architecture Data Source In-motion Compute (Backﬁll from historical
log) Materialized Views Online Query (serving) Ofﬂine Query (training) batch transformation streaming transformation

Unified Architecture Data Source In-motion Compute (intelligent backﬁll from dual
sources) Materialized Views Online Query (serving) Ofﬂine Query (training) batch transformation streaming transformation DWH backed logs Backing

Batch and streaming source unified to simplify backfill Time DWH
Stream Dual source cutover

Streaming Leaning Batch Leaning Need an invisible interface to plug
into storage ecosystems

Data Fabric for a Streaming Pipeline

Data Fabric for a Unified Backfill Pipeline

Training dataset backfill requires point-in-time correctness Time Feature data Feature
data Feature data Prediction events Feature data

Point-in-time joins to generate training data 29 Proprietary & Confidential
Given a spine (entity keys + timestamp + label), join features to generate training data spine_df train_df = pitc_join_features( spine_df, features=[ "tx_max_1h", "user_unique_ip_30d", ], ) inference_ts tid cc_num user_id is_fraud 21:30 0122 2 1 0 21:40 0298 4 1 0 21:55 7539 6 3 1 inference_ts tid cc_num user_id is_fraud tx_max_1h user_unique_ip_30d 21:30 0122 2 1 1 … … 21:40 0298 4 1 1 … … 21:55 7539 6 3 3 … … ts cc_num tx_max_1h 9:20 2 … 10:24 2 … 20:00 4 … cc_num_tx_max_1h ts user_id unique_ip_30d 6:00 1 … 6:00 3 … 6:00 5 … user_unique_id_30d

Solution 2: Abstract streaming and batch data storage • Unified
streaming & batch source • Unified online & offline feature stores • Pluggable to most storage technologies

Challenge 3: It should just work! • Cost, latency, correctness
surprises! • Lack optimizations knobs

Batch processing (cheap and correct) Cost Latency Correctness Stream processing
without consistency (fast and cheap) Stream processing with consistency enforced (fast and correct)

Workload Compilation Optimization Relational Expression @transformation def transaction_count(tx: Transactions, wspec:
WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Optimization Various intelligent optimization can be done to make appropriate tradeoff across storage and compute systems. Deployment

Customer managed in your own cloud Guardrail for schema changes
Tunable workload optimization Claypot Feature SDK (Python) Feature Catalog Online store Ofﬂine store Feature Serving Filter Scan Scan Union Join Uniﬁed Processing Filter

Solution 3: Optimization knobs • Abstract optimization complexity • User
controls with high level knobs • Trust, no surprises!

Make invisible interface possible! • Ubiquitous • Easy and responsive
• Just works! https://zhenzhongxu.com/ [email protected] the invisible interface

[2023] Complexities You Should Care about Doing...

[2023] Complexities You Should Care about Doing Real-time ML

Zhenzhong Xu

More Decks by Zhenzhong Xu

Featured

Transcript

Invisible Interfaces Zhenzhong Xu Cofounder & CTO @ claypot.ai July,

The discovery of something invisible

The Invisible Interface Ubiquitous Easy and responsive Just works! The

Real-time Decisions that powers your business Fraud prevention Personalization Customer

The world is moving towards real-time • Instacart: The Journey

Real-time Decisions Data Fabric for Real-time AI Data Infrastructure Exploration

Model Serving Model Training Model Monitoring Model Evaluation Prediction Input

The hard things towards real-time decisions • Data silo and

Challenge 1: From Experimentation to Production • Slow prototyping •

Local Experimentation with Traditional Models

Local Experimentation with LLMs

Sources Feature store online + ofﬂine Prediction service Feature API

Local/Single Machine Remote/Distributed Need an invisible interface to plug into

Declare features with familiar APIs @transformation def average_transaction_amount_by_merchant( tx: Transactions,

17 Workload Compiler / Optimizer Deployment Relational Expression @transformation def

Workload Compiler/Optimizer Deployment Relational Expression @transformation def transaction_count(tx: Transactions, wspec:

Solution 1: Relational Expression based Compilation • Unified yet familiar

Challenge 2: Streaming and Batch Divided • Evolving architecture •

Data Source In-motion Compute At-rest Compute Online Storage Ofﬂine Storage

Kappa (Streaming) Architecture Data Source In-motion Compute (Backﬁll from historical

Unified Architecture Data Source In-motion Compute (intelligent backﬁll from dual

Batch and streaming source unified to simplify backfill Time DWH

Streaming Leaning Batch Leaning Need an invisible interface to plug

Data Fabric for a Streaming Pipeline

Data Fabric for a Unified Backfill Pipeline

Training dataset backfill requires point-in-time correctness Time Feature data Feature

Point-in-time joins to generate training data 29 Proprietary & Confidential

Solution 2: Abstract streaming and batch data storage • Unified

Challenge 3: It should just work! • Cost, latency, correctness

Batch processing (cheap and correct) Cost Latency Correctness Stream processing

Workload Compilation Optimization Relational Expression @transformation def transaction_count(tx: Transactions, wspec:

Customer managed in your own cloud Guardrail for schema changes

Solution 3: Optimization knobs • Abstract optimization complexity • User

Make invisible interface possible! • Ubiquitous • Easy and responsive