miles is an utterly insignificant little blue- green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea…”
IPython, Numba, Matplotlib, Spyder, Numexpr, Cython, Theano, Scikit-image, NLTK, NetworkX and 150+ packages conda PYTHON cond conda • Anaconda: Python distribution that includes 150+ packages for data science • conda: Cross-platform and language agnostic package and environment manager • Miniconda: Minified version of Anaconda, with just Python and conda. • Anaconda Cloud: Cloud service to host and share public and private packages, environments and notebooks • conda environments: custom isolated sandboxes to easily reproduce and share data science projects
Matplotlib, Spyder, Numexpr, Cython, Theano, Scikit-image, NLTK, NetworkX and 150+ packages conda PYTHON cond conda Why Anaconda? • Easy to install on all platforms • Trusted by industry leaders: e.g. Microsoft Azure ML • Large user base: 3M+ downloads • BSD license • Extensible - easily build, share and install proprietary libraries with Anaconda Cloud • Language agnostic - Python, R, Scala… • Allows isolated custom sandboxes with different versions of packages Anaconda: Intro
2016), Anaconda now includes the Intel Math Kernel Library (MKL) optimizations (version 11.3.1) for improved performance. Available by default and free for all. conda update conda conda install anaconda=2.5
packages, environments and channels. • No need of using the command line. •Available for Windows, OS X and Linux. • Anaconda Navigator has replaced Launcher. • Integration with Anaconda Cloud. A desktop graphical user interface included in Anaconda
build infrastructure and distributions for the conda package manager. • Each repo (feedstock), automatically builds with CI (AppVeyor, CircleCI and TravisCI) • Builds are uploaded to Anaconda Cloud conda config —add channels conda-forge conda install <package_name>
80+ R packages for data science • MRO: Microsoft R Open distribution with MKL Added support for R conda config —add channels r conda install r-essentials conda config —add channels mro conda install r
dropdown inside notebook UI to switch between conda envs • nb_conda: help manage conda envs from inside file viewer of jupter notebook nb_condakernel nb_conda
or "labeled" data both easy and intuitive. • Building block for doing practical, real world data analysis in Python. high-performance, easy- to-use data structures and data analysis tools • Automatic data alignment • Rolling, expanding, and EWM operations • Timeseries operations, including fill or drop missing values • Resampling & ordered merges • Timezone handling • Date offsets & holiday support • Intelligent interactive indexing
The .to_xarray() function has been added for compatibility with the xarray package • pd.read_sas() has gained the ability to read SAS7BDAT files • Conditional formatting, the visual styling of a DataFrame depending on the data within, by using the DataFrame.style property. 23 Pandas: What’s new? pd.rolling_mean(df,window=3) r = df.rolling(window=3) http://pandas.pydata.org/pandas-docs/stable/style.html
browser presentation • No JavaScript • Python, R, Scala and Lua bindings • Easy to embed in web applications • Server apps: data can be updated, and UI and selection events can be processed to trigger more visual updates.
• bokeh command line tool for creating applications • expanded docs including deployment guidance • video demonstrations and tutorials • supports async, periodic, timeout and model event callbacks • python client API
of large amounts of data • Provides automatic, nearly parameter-free visualization of datasets • Allows extensive customization of each step in the data-processing pipeline • Supports automatic downsampling and re- rendering with Bokeh and the Jupyter notebook • Works well with dask and numba to handle very large datasets in and out of core (with examples using billions of datapoints) https://github.com/bokeh/datashader
functions written directly in Python. • Just-in-time compiled to native machine instructions • Similar in performance to C, C++ and Fortran • Supports compilation of Python to run on either CPU or GPU hardware • Integrates well with the Python scientific software stack.
and blocked algorithms • Familiar: Implements parallel NumPy and Pandas objects • Fast: Optimized for demanding for numerical applications • Flexible: for sophisticated and messy algorithms • Scales up: Runs resiliently on clusters of 100s of machines • Scales down: Pragmatic in a single process on a laptop • Interactive: Responsive and fast for interactive data science
Explore: • Pandas • Bokeh • Scale: • Numba • Dask The Hitchhiker’s Guide to Data Science 42 The Answer to the Ultimate Question of Life, The Universe, and Everything.