Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Continuous Delivery for Machine Learning System...

Adarsh Shah
November 12, 2020

Continuous Delivery for Machine Learning Systems - ADDO

Machine Learning workflow includes data management, experiment management (model training & development), model deployment, serving, and retraining. Training a model takes hours & some times days & typically deals with a large dataset. Training & serving a model also require special resources like high-density cores & GPUs.

In this talk, we will look at how Continuous Delivery for Machine Learning looks like using anecdotes and how to use cloud-native technologies to perform various steps in a Machine Learning workflow. We will also be talking about how it is different from deploying other software and what are the various aspects to consider. We will also be looking at different tools available to enable Continuous Delivery for machine learning.

Adarsh Shah

November 12, 2020
Tweet

More Decks by Adarsh Shah

Other Decks in Technology

Transcript

  1. Continuous Delivery for Machine Learning Systems Deploying ML Systems to

    Production safely and quickly in a sustainable way Adarsh Shah Engineering Leader, Coach, Hands-on Architect Independent Consultant @shahadarsh 
 https://shahadarsh.com
  2. https://shahadarsh.com @shahadarsh Hidden Technical Debt in ML Systems From the

    paper Hidden Technical Debt in Machine Learning Systems
  3. https://shahadarsh.com @shahadarsh 1 0 1 0 1 0 1 0

    1 Results Traditional Software Development Machine Learning Program Data { } 1 0 1 0 1 0 1 0 1 Desired Results Model Training Data { } Program { } 1 0 1 0 1 0 1 0 1 Live Data Training Prediction Results
  4. https://shahadarsh.com @shahadarsh Data Acquisition Data Preparation Model Development Training Prediction

    Accuracy Evaluation Data Management Experimentation Production Deployment Validation Monitoring / Alerting Accuracy not reached Retrain Data Drift Fix Accuracy reached
  5. https://shahadarsh.com @shahadarsh #2: Experimentation Code Quality Research & 
 Experimentation

    Tracking experiments Training Time 
 & Troubleshooting Infrastructure 
 Requirements Model Accuracy Evaluation
  6. https://shahadarsh.com @shahadarsh What is Continuous Delivery? Continuous Delivery is the

    ability to get changes of all types—including new features, configuration changes, bug fixes and experiments—into production, or into the hands of users, safely and quickly in a sustainable way. - Jez Humble & Dave Farley 
 (Continuous Delivery Book Authors)
  7. https://shahadarsh.com @shahadarsh Continuous Integration Continuous Integration is a software development

    practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. - Martin Fowler
  8. https://shahadarsh.com @shahadarsh Principles of Continuous Delivery ๏ Build quality in

    ๏ Work in small batches ๏ Computers perform repetitive tasks, people solve problems ๏ Relentlessly pursue continuous improvement (Kaizen) ๏ Everyone is responsible
  9. https://shahadarsh.com @shahadarsh Data pipeline Data Source A Data Source B

    Data Source C Data Acquisition A Data Validation
 A Data Preparation
 A Training 
 Dataset Versioned Training Process Testing Data Acquisition B Data Validation
 B Data Preparation
 B Data Acquisition C Data Validation
 C Data Preparation
 C Bias & Fairness —— Security 
 & Compliance
  10. https://shahadarsh.com @shahadarsh Static Analysis Unit Tests Training Code Linting etc.

    Artifact Repository Build Artifact Continuous Integration (Training Code) Dev Environment Validation Tests Merge to 
 Main Branch
  11. https://shahadarsh.com @shahadarsh Data Pipeline Continuous Integration 
 (Training Code) Configuration

    Training 
 Dataset Training Environment Accuracy Evaluation Monitoring/ Alerting Testing (Bias & Fairness) Model Trigger Log Aggregation Automated 
 Provisioning/De-provisioning Data Scientist Training
  12. https://shahadarsh.com @shahadarsh Static Analysis Unit Tests Application Code Linting, Security

    Scan etc. Artifact Repository Build Artifact Ephemeral Environment Integration Tests Tag as Tested Model Continuous Integration (Application Code) Training
  13. https://shahadarsh.com @shahadarsh Data Management Experimentation Production Deployment Data Pipeline Continuous

    Integration 
 (Training Code) Data Scientist Configuration Training Model Continuous Integration 
 (Application Code) Deployment Production Environment Smoke Tests Monitoring /Alerting Application 
 Developer Bringing it all together Training 
 Dataset
  14. https://shahadarsh.com @shahadarsh References • continuousdelivery.com • Dr. Deming’s 14 Points

    for Management • Challenges Deploying Machine Learning Models to Production • State of DevOps Report • martinfowler.com • Large image datasets: A pyrrhic win for computer vision?