Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GeoRodeo 2017 Anaconda

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

GeoRodeo 2017 Anaconda

Avatar for Christine Doig

Christine Doig

May 19, 2017
Tweet

More Decks by Christine Doig

Other Decks in Programming

Transcript

  1. © 2016 Continuum Analytics - Confidential & Proprietary © 2017

    Continuum Analytics - Confidential & Proprietary Anaconda for GIS professionals GeoRodeo 2017 Christine Doig, Continuum Analytics May 19th, 2017
  2. © 2017 Continuum Analytics - Confidential & Proprietary 2 •

    Introduction to Anaconda • Data Science Development & Deployment with Anaconda Projects • Anaconda Projects GIS examples • OSS libraries for GIS professionals: • Bokeh, interactive data visualizations • Datashader, graphics pipeline system for creating meaningful representations of large amounts of data • Dask, flexible parallel computing library for analytics • Other libraries: GeoViews and Holoviews Agenda
  3. © 2017 Continuum Analytics - Confidential & Proprietary 4 Anaconda,

    the leading Data Science ecosystem with over 4M users
  4. © 2017 Continuum Analytics - Confidential & Proprietary 5 Numba

    dask xlwings Airflow Blaze Distributed 
 Systems Business 
 Intelligence Web Scientific 
 Computing / HPC Machine Learning
 / Statistics ANACONDA DISTRIBUTION Python & R distribution with 1000+ curated packages that makes it easy to get started with Data Science
  5. © 2016 Continuum Analytics - Confidential & Proprietary 8 •

    Install data science libraries $ conda install pandas • Manage package versions $ conda install pandas=0.14 • Create isolated environments $ conda create -n myenv python=3.5 pandas=0.18 • Update package version $ conda update pandas
  6. © 2016 Continuum Analytics - Confidential & Proprietary 10 anaconda-project.yml

    • Define and manage: • project package dependencies • deployment commands • data • …
  7. © 2016 Continuum Analytics - Confidential & Proprietary 11 •

    Launch applications • Manage package versions and environments • Create and upload projects
  8. © 2017 Continuum Analytics - Confidential & Proprietary 13 Biz

    Analyst Data Scientists Explore, Analyze & Collaborate
  9. © 2017 Continuum Analytics - Confidential & Proprietary 14 DevOps

    Scale, Deploy & Operate Developer Data Engineers
  10. © 2017 Continuum Analytics - Confidential & Proprietary 15 How

    do you… • Download and install data science libraries? • Manage versions and dependencies? • Upgrade libraries? • Isolate dependencies between projects? Challenges in data science development WITH ANACONDA DISTRIBUTION & CONDA
  11. © 2016 Continuum Analytics - Confidential & Proprietary 16 What

    do data scientists develop? Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data
  12. © 2016 Continuum Analytics - Confidential & Proprietary 17 Laptop

    Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R
  13. © 2017 Continuum Analytics - Confidential & Proprietary 18 How

    do you… • Share your data science project with others? • Ensure that you can reproduce your analysis? • Deploy your project? Challenges in data science development and deployment WITH ANACONDA PROJECTS
  14. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Server

    Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Deployment
  15. © 2016 Continuum Analytics - Confidential & Proprietary Laptop Project

    1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Development and Deployment Anaconda Enterprise Container 1 Container 2 Container 3 Container 4
  16. © 2016 Continuum Analytics - Confidential & Proprietary • Dependencies

    • Data • Deployment commands • Security • Scalability • Availability Anaconda Enterprise 21
  17. © 2017 Continuum Analytics - Confidential & Proprietary 22 Innovator

    Program http://go.continuum.io/anaconda-enterprise-innovator/
  18. © 2017 Continuum Analytics - Confidential & Proprietary 27 2010

    US Census data (by population density and race)
  19. © 2017 Continuum Analytics - Confidential & Proprietary 29 https://anaconda.org/koverholt/projects

    - datashader_nyctaxi - deck_gl_geojson https://anaconda.org/jbednar/osm-1billion/notebook https://anaconda.org/jbednar/census/notebook https://anaconda.org/jbednar/census-hv-dask/notebook Examples available:
  20. © 2017 Continuum Analytics - Confidential & Proprietary 31 Interactive

    visualization framework that targets modern web browsers for presentation • No JavaScript • Python, R, Scala and Lua bindings • Easy to embed in web applications • Server apps: data can be updated, and UI and selection events can be processed to trigger more visual updates. http://bokeh.pydata.org/en/latest/ Bokeh
  21. © 2017 Continuum Analytics - Confidential & Proprietary Motivation 34

    • Visualize large amounts of data in a meaningful way • Interactively explore the data
  22. © 2017 Continuum Analytics - Confidential & Proprietary Datashader 35

    Overplotting: Oversaturation: Undersampling: https://anaconda.org/jbednar/plotting_pitfalls/notebook
  23. © 2017 Continuum Analytics - Confidential & Proprietary Datashader 36

    graphics pipeline system for creating meaningful representations of large amounts of data • Provides automatic, nearly parameter-free visualization of datasets • Allows extensive customization of each step in the data-processing pipeline • Supports automatic downsampling and re- rendering with Bokeh and the Jupyter notebook • Works well with dask and numba to handle very large datasets in and out of core (with examples using billions of datapoints) https://github.com/bokeh/datashader NYC census data by race
  24. © 2017 Continuum Analytics - Confidential & Proprietary Motivation 38

    • Data > memory in laptop • Similar to a pandas solution
  25. © 2017 Continuum Analytics - Confidential & Proprietary Dask Dataframes

    39 >>> import pandas as pd >>> df = pd.read_csv('iris.csv') >>> df.head() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa >>> max_sepal_length_setosa = df[df.species == 'setosa'].sepal_length.max() 5.7999999999999998 >>> import dask.dataframe as dd >>> ddf = dd.read_csv('*.csv') >>> ddf.head() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa … >>> d_max_sepal_length_setosa = ddf[ddf.species == 'setosa'].sepal_length.max() >>> d_max_sepal_length_setosa.compute() 5.7999999999999998 Dask dataframes look and feel like pandas dataframes, but operate on datasets larger than memory using multiple threads
  26. © 2017 Continuum Analytics - Confidential & Proprietary Distributed 40

    http://distributed.readthedocs.io/en/latest/ Distributed is a lightweight library for distributed computing in Python. It extends dask APIs to moderate sized clusters.
  27. © 2017 Continuum Analytics - Confidential & Proprietary Web UI

    41 Dask.distributed includes a web interface to help deliver information about the current state of the network helps to track progress, identify performance issues, and debug failures over a normal web page in real time.
  28. © 2017 Continuum Analytics - Confidential & Proprietary 43 HoloViews

    is a Python library that makes analyzing and visualizing scientific or engineering data much simpler, more intuitive, and more easily reproducible. http://holoviews.org/index.html
  29. © 2017 Continuum Analytics - Confidential & Proprietary 44 GeoViews

    is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research. http://geo.holoviews.org/
  30. © 2017 Continuum Analytics - Confidential & Proprietary Resources 45

    Bokeh documentation: http://bokeh.pydata.org/en/latest/ Bokeh demos: https://demo.bokehplots.com/ Datashader documentation: http://datashader.readthedocs.org/ Bokeh + datashader tutorial: https://github.com/bokeh/bokeh-notebooks Bokeh webinar: http://go.continuum.io/hassle-free-data-science-apps/ Datashader webinar: http://go.continuum.io/datashader/ Geoviews blogpost: https://www.continuum.io/blog/developer-blog/introducing-geoviews Geoviews documentation: http://geo.holoviews.org/index.html Holoviews documentation: http://holoviews.org/index.html