Continuum Analytics - Confidential & Proprietary Anaconda Project An introduction to data science portability, development and deployment Christine Doig, Product Manager and Senior Data Scientist Continuum Analytics
data science? • Introduction to Anaconda • Data science development and deployment • Anaconda and Docker • Anaconda Project • Anaconda Enterprise Agenda 2
Science is not just Machine Learning… Distributed Systems Business Intelligence Machine Learning & Statistics Software & Web development Scientific Computing & HPC
Systems Business Intelligence Machine Learning & Statistics Software & Web development Scientific Computing & HPC Classification, deep learning, Regression, PCA distributed file system, message passsing, schedulers, resource managers Web crawling, scraping, 3rd party data & API providers, software packaging, CI, testing array computing, simulation, optimization, GPUs, multi-cores Data warehouse, querying, reporting, data visualization, dashboards Data Science is Interdisciplinary…
Systems Business Intelligence Machine Learning & Statistics Software & Web development Scientific Computing & HPC Numba dask xlwings Blaze Airflow Open Source Communities Create Powerful Technologies for Data Science
you…? • Download and install data science libraries • Manage versions and dependencies • Upgrade libraries • Isolate dependencies between projects Challenges in the open data science ecosystem 7
dask xlwings Airflow Blaze Distributed Systems Business Intelligence Web Scientific Computing / HPC Machine Learning / Statistics ANACONDA Python & R distribution with 1000+ curated packages that makes it easy to get started with Open Data Science
do data scientists develop? Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data
Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R
you…? • Share your data science project with others • Ensure that you can reproduce your analysis • Deploy your project Challenges in data science development and deployment 19
more ANACONDA AND DOCKER - BETTER TOGETHER FOR REPRODUCIBLE DATA SCIENCE Monday, June 20, 2016 https://www.continuum.io/blog/developer-blog/anaconda-and-docker-better-together-reproducible-data-science ANACONDA FOR R USERS: SPARKR AND RBOKEH Monday, February 1, 2016 https://www.continuum.io/blog/developer-blog/anaconda-r-users-sparkr-and-rbokeh JUPYTER AND CONDA FOR R Monday, September 7, 2015 https://www.continuum.io/blog/developer/jupyter-and-conda-r CONDA FOR DATA SCIENCE Thursday, May 21, 2015 https://www.continuum.io/content/conda-data-science
more PRODUCTIONIZING AND DEPLOYING DATA SCIENCE PROJECTS Wednesday, February 1, 2017 https://www.continuum.io/blog/developer-blog/productionizing-and-deploying-data-science-projects SECURE AND SCALABLE DATA SCIENCE DEPLOYMENTS WITH ANACONDA Monday, February 27, 2017 https://www.continuum.io/blog/developer-blog/secure-and-scalable-data-science-deployments-anaconda ANNOUNCING ANACONDA PROJECT: DATA SCIENCE PROJECT ENCAPSULATION AND DEPLOYMENT, THE EASY WAY! Monday, March 20, 2017 https://www.continuum.io/blog/developer-blog/%E2%80%8Banaconda-project-data-science-project-encapsulation- deployment