Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[2023] Complexities You Should Care about Doing...

Zhenzhong Xu
November 16, 2023
13

[2023] Complexities You Should Care about Doing Real-time ML

If you are a data scientist or a platform engineer, you probably can relate to the pains of working with the current explosive growth of Data/ML technologies and toolings. With many overlapping options and steep learning curves for each, it’s increasingly challenging for data science teams. Many platform teams started thinking about building an abstracted ML platform layer to support generalized ML use cases. But there are many complexities involved, especially when dealing with Real-time or Near Real-time ML.

In this talk, we’ll discuss why ML platforms can benefit from a simple and “invisible” abstraction. We’ll offer some evidence on why you should consider leveraging streaming technologies despite all the hard challenges you have heard. We’ll share learnings (combining both ML and Infra perspectives) about some of the hard complexities involved in building such simple abstractions, the design principles behind them, and some counterintuitive decisions you may come across along the way.

By the end of the talk, I hope data scientists can walk away with tips to be more productive with ML workflows, and platform engineers learned a few architectural and design tricks that can make your organization future-proof.

Zhenzhong Xu

November 16, 2023
Tweet

Transcript

  1. Invisible Interfaces Zhenzhong Xu Cofounder & CTO @ claypot.ai July,

    2023 Considerations for Abstracting Complexities of a Real-time ML Platform
  2. Real-time Decisions that powers your business Fraud prevention Personalization Customer

    support Dynamic pricing/discounting Trending products Risk Assessment Account Take Over Ads ETA Network analysis Sentiment analysis Object detection …
  3. The world is moving towards real-time • Instacart: The Journey

    to Real-Time Machine Learning (2022) ◦ Directly reduces millions of fraud-related costs annually. • LinkedIn’s Real-time Anti-abuse (2022) ◦ LinkedIn moved from an offline pipeline (hours) to real-time pipeline (minutes), and saw 30% increase in bad actors caught online and 21% improvement in fake account detection. • How WhatsApp catches and fights abuse (2022 | slides) ◦ A few 100ms delay can increase the spam by 20-30%. • How Pinterest Leverages Realtime User Actions in Recommendation to Boost Engagement (2022) ◦ According to Pinterest, this “has been one of our most impactful innovations recently, increasing Home feed engagement by 11% while reducing Pinner hide volume by 10%.” • Airbnb: Real-time Personalization using Embeddings for Search Ranking (2018) ◦ Moving from offline scoring to online scoring grows bookings by +5.1% 5
  4. Real-time Decisions Data Fabric for Real-time AI Data Infrastructure Exploration

    & Research Model Architecture & Turning Model Analysis & Selection Ingestion & Transport Security & Governance Multi-tenancy Isolation Data Sources Storage Query & Compute LLM Prompt Engineering Workflow Orchestration Analytics / Visualization
  5. Model Serving Model Training Model Monitoring Model Evaluation Prediction Input

    Training Input Data Monitoring Data Model Flow Data Flow Product Ecosystem Analytics ecosystem
  6. The hard things towards real-time decisions • Data silo and

    staleness • Collaboration overhead • Tech complexity
  7. Challenge 1: From Experimentation to Production • Slow prototyping •

    Local vs. remote execution • Divergent language & runtime
  8. Sources Feature store online + offline Prediction service Feature API

    Create, experiment, & deploy features Computation engines Training service Feature catalog Data scientists Central repo
  9. Declare features with familiar APIs @transformation def average_transaction_amount_by_merchant( tx: Transactions,

    wspec: WindowSpec): return tx.groupby(["cc_num", "merchant"])["amt"].window(wspec).mean()
  10. 17 Workload Compiler / Optimizer Deployment Relational Expression @transformation def

    transaction_count(tx: Transactions, wspec: WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Data Science Friendly: Python <> SQL
  11. Workload Compiler/Optimizer Deployment Relational Expression @transformation def transaction_count(tx: Transactions, wspec:

    WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Same code can run on different computation engines Compile into a relational expression (RE), which is SQL equivalent Intermediate Representation Compile & optimize RE into the computation engine (e.g., Panda, DuckDb, Flink, Spark) best suited for the job Spin up and manage computation jobs
  12. Solution 1: Relational Expression based Compilation • Unified yet familiar

    API • Pluggable to many compute engines • Minimize human error • Prototype in minutes
  13. Challenge 2: Streaming and Batch Divided • Evolving architecture •

    Difficult to backfill • Train-predict inconsistencies
  14. Data Source In-motion Compute At-rest Compute Online Storage Offline Storage

    Online Query (serving) Mixed Query (backfill) Offline Query (training) Lambda Architecture
  15. Kappa (Streaming) Architecture Data Source In-motion Compute (Backfill from historical

    log) Materialized Views Online Query (serving) Offline Query (training) batch transformation streaming transformation
  16. Unified Architecture Data Source In-motion Compute (intelligent backfill from dual

    sources) Materialized Views Online Query (serving) Offline Query (training) batch transformation streaming transformation DWH backed logs Backing
  17. Point-in-time joins to generate training data 29 Proprietary & Confidential

    Given a spine (entity keys + timestamp + label), join features to generate training data spine_df train_df = pitc_join_features( spine_df, features=[ "tx_max_1h", "user_unique_ip_30d", ], ) inference_ts tid cc_num user_id is_fraud 21:30 0122 2 1 0 21:40 0298 4 1 0 21:55 7539 6 3 1 inference_ts tid cc_num user_id is_fraud tx_max_1h user_unique_ip_30d 21:30 0122 2 1 1 … … 21:40 0298 4 1 1 … … 21:55 7539 6 3 3 … … ts cc_num tx_max_1h 9:20 2 … 10:24 2 … 20:00 4 … cc_num_tx_max_1h ts user_id unique_ip_30d 6:00 1 … 6:00 3 … 6:00 5 … user_unique_id_30d
  18. Solution 2: Abstract streaming and batch data storage • Unified

    streaming & batch source • Unified online & offline feature stores • Pluggable to most storage technologies
  19. Batch processing (cheap and correct) Cost Latency Correctness Stream processing

    without consistency (fast and cheap) Stream processing with consistency enforced (fast and correct)
  20. Workload Compilation Optimization Relational Expression @transformation def transaction_count(tx: Transactions, wspec:

    WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Optimization Various intelligent optimization can be done to make appropriate tradeoff across storage and compute systems. Deployment
  21. Customer managed in your own cloud Guardrail for schema changes

    Tunable workload optimization Claypot Feature SDK (Python) Feature Catalog Online store Offline store Feature Serving Filter Scan Scan Union Join Unified Processing Filter
  22. Solution 3: Optimization knobs • Abstract optimization complexity • User

    controls with high level knobs • Trust, no surprises!
  23. Make invisible interface possible! • Ubiquitous • Easy and responsive

    • Just works! https://zhenzhongxu.com/ [email protected] the invisible interface