Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps for data science: automate the boring st...

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

Tania Allard

August 06, 2020
Tweet

More Decks by Tania Allard

Other Decks in Technology

Transcript

  1. DevOps for Data Science? Automate the boring stuff and leverage

    the OSS ecosystem PyCon Africa – August 6th, 2020 Tania Allard, PhD @ixek
  2. About Me I Python I am also a GDE for

    Tensorflow I love mechanical keyboards My dog usually barks while I am giving online talks
  3. background ML and Data Science in 2020 Table of Contents

    1 What is even MLOps? And why you’d need it… 2 Mlops 101 Getting started with MLOps 3 @ixek
  4. Data Scientist It’s never been easier to run ML experiments

    ML engineer /SRE Machine learning in production is hard y’all! Every team @ixek
  5. • Tools like scikit-learn and Keras make it easy to

    create models in a few lines • Techniques like transfer learning make our lifes easier • More Compute! All the GPUs! From the DS perspective
  6. The new unicorn Must have Analytical skills Software eng. Programming

    Data engineering Data visualization Also must have Containerization End-to-end ML pipeline CI /CD /Versioning Deep learning / NLP / etc. Privacy and security @ixek
  7. Where is my unicorn? A mythical data scientist who can

    code, write unit tests AND resist the lure of a deep neural network when logistic regression will do.
  8. The origin of devops Software developers: Need to move and

    iterate fast Operation team: Stability and availability of services is priority @ixek
  9. DevOps is the union of people, process, and products to

    enable continuous delivery of value into production - Donovan Brown “ @ixek
  10. Automate Automate everything you can (data processing, model training) Feedback

    Get feedback on new ideas fast (test immediately) No manual handoffs Provide early testing opportunities DevOps principles @ixek
  11. Continuous integration – software engineering Based on test results –

    no waiting time* Quick testing Automated build Project source code in version control Code changes Automate Feedback iterate @ixek
  12. Technical considerations • Reliance on metrics (e.g. accuracy, specifity) •

    Data visualization • Required domain knowledge So what about ML? @ixek
  13. The origin of mlops Data scientist: • Need to move

    and iterate fast • Use my loved frameworks • Scalable • Minimal wait: test, stage production SRE/ML Engineers: • Reuse of tooling and platforms • Uptime • Monitoring • Reliability and stability @ixek
  14. Continuous integration – software engineering Improve model based on outputs/outcomes

    Sought metrics Automated training / data processing Project source code in version control. Data lineage. Code& data changes Automate Feedback iterate @ixek
  15. RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS

    dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek
  16. MlOps step by step ENV #1 CI/CD Pipeline Process Train

    Stage Serve Data Distributed Cloud ENV #2 Data Scientist SRE/ML Engineers @ixek
  17. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud First,

    I check in my code. ENV #1 ENV #2 Data Scientist Data Scientist SRE/ML Engineers MlOps step by step @ixek
  18. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud That

    kicks off a CI/CD Pipeline. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  19. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And

    now do a training run on the processed data ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  20. Not only tests Can leverage to do the training or

    data processing Vision Venus has a beautiful name and is the second planet from the Sun. It’s atmosphere is extremely poisonous @ixek
  21. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Actually

    need to update the parameters ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step
  22. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Model

    is optimized and working! Let’s roll out to production. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  23. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Trigger

    the CI/CD pipeline one last time. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  24. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And

    roll out to the world! ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  25. In brief MLOps allows you to be more efficient with

    the tools you use and love @ixek
  26. RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS

    dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek