Introduction to Anaconda • Data Science Development & Deployment with Anaconda Projects • Anaconda Projects GIS examples • OSS libraries for GIS professionals: • Bokeh, interactive data visualizations • Datashader, graphics pipeline system for creating meaningful representations of large amounts of data • Dask, flexible parallel computing library for analytics • Other libraries: GeoViews and Holoviews Agenda
dask xlwings Airflow Blaze Distributed Systems Business Intelligence Web Scientific Computing / HPC Machine Learning / Statistics ANACONDA DISTRIBUTION Python & R distribution with 1000+ curated packages that makes it easy to get started with Data Science
do you… • Download and install data science libraries? • Manage versions and dependencies? • Upgrade libraries? • Isolate dependencies between projects? Challenges in data science development WITH ANACONDA DISTRIBUTION & CONDA
do data scientists develop? Workflows Data Query Visualize Clean & Tidy Predict, Simulate, & Optimize R P In N In A P M Interactive data visualizations and dashboards Jupyter notebooks Scripts Predictive models Processed Data
Data Science Development scikit-learn Bokeh Tensorflow Jupyter pandas matplotlib seaborn dask numba script 1 script 2 notebook A dataset Z script 3 Python, R
do you… • Share your data science project with others? • Ensure that you can reproduce your analysis? • Deploy your project? Challenges in data science development and deployment WITH ANACONDA PROJECTS
visualization framework that targets modern web browsers for presentation • No JavaScript • Python, R, Scala and Lua bindings • Easy to embed in web applications • Server apps: data can be updated, and UI and selection events can be processed to trigger more visual updates. http://bokeh.pydata.org/en/latest/ Bokeh
graphics pipeline system for creating meaningful representations of large amounts of data • Provides automatic, nearly parameter-free visualization of datasets • Allows extensive customization of each step in the data-processing pipeline • Supports automatic downsampling and re- rendering with Bokeh and the Jupyter notebook • Works well with dask and numba to handle very large datasets in and out of core (with examples using billions of datapoints) https://github.com/bokeh/datashader NYC census data by race
http://distributed.readthedocs.io/en/latest/ Distributed is a lightweight library for distributed computing in Python. It extends dask APIs to moderate sized clusters.
41 Dask.distributed includes a web interface to help deliver information about the current state of the network helps to track progress, identify performance issues, and debug failures over a normal web page in real time.
is a Python library that makes analyzing and visualizing scientific or engineering data much simpler, more intuitive, and more easily reproducible. http://holoviews.org/index.html
is a Python library that makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research. http://geo.holoviews.org/