Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Python's Visualization Landscape (PyCon 2017)

Python's Visualization Landscape (PyCon 2017)

So you want to visualize some data in Python: which library do you choose? From Matplotlib to Seaborn to Bokeh to Plotly, Python has a range of mature tools to create beautiful visualizations, each with their own strengths and weaknesses. In this talk I’ll give an overview of the landscape of dataviz tools in Python, as well as some deeper dives into a few, so that you can intelligently choose which library to turn to for any given visualization task.

Video: https://www.youtube.com/watch?v=FytuB8nFHPQ

Jake VanderPlas

May 21, 2017
Tweet

More Decks by Jake VanderPlas

Other Decks in Technology

Transcript

  1. @jakevdp Jake VanderPlas [Python’s Visualization Landscape] From the abstract: “In

    this talk I’ll give an overview of the landscape of dataviz tools in Python . . .”
  2. @jakevdp Jake VanderPlas [Python’s Visualization Landscape] From the abstract: “In

    this talk I’ll give an overview of the landscape of dataviz tools in Python . . .”
  3. @jakevdp Jake VanderPlas From the abstract: “In this talk I’ll

    give an overview of the landscape of dataviz tools in Python . . .” [Python’s Visualization Landscape]
  4. @jakevdp Jake VanderPlas From the abstract: “In this talk I’ll

    give an overview of the landscape of dataviz tools in Python . . .” [Python’s Visualization Landscape]
  5. @jakevdp Jake VanderPlas From the abstract: “In this talk I’ll

    give an overview of the landscape of dataviz tools in Python . . .” [Python’s Visualization Landscape]
  6. @jakevdp Jake VanderPlas From the abstract: “In this talk I’ll

    give an overview of the landscape of dataviz tools in Python . . .” [Python’s Visualization Landscape]
  7. @jakevdp Jake VanderPlas From the abstract: “In this talk I’ll

    give an overview of the landscape of dataviz tools in Python . . .” [Python’s Visualization Landscape]
  8. @jakevdp Jake VanderPlas From the abstract: “In this talk I’ll

    give an overview of the landscape of dataviz tools in Python . . .” [Python’s Visualization Landscape] From the abstract: “In this talk I’ll give an overview of the landscape of dataviz tools in Python . . .”
  9. @jakevdp Jake VanderPlas [Python’s Visualization Landscape] From the abstract: “In

    this talk I’ll give an overview of the landscape of dataviz tools in Python . . .”
  10. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript bokeh plotly
  11. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript bqplot bokeh toyplot plotly
  12. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume ipyleaflet
  13. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks ipyleaflet
  14. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks ipyleaflet
  15. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks d3js mpld3 ipyleaflet
  16. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks d3js mpld3 ipyleaflet Vega-Lite Vega
  17. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks d3js mpld3 Altair Vincent ipyleaflet d3po Vega-Lite Vega
  18. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks d3js mpld3 Altair Vincent ipyleaflet d3po Vega-Lite Vega
  19. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks datashader d3js mpld3 Altair Vincent ipyleaflet d3po Vega-Lite Vega
  20. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks datashader d3js mpld3 Altair Vincent ipyleaflet d3po Vega-Lite Vega Vaex
  21. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent ipyleaflet d3po Vega-Lite Vega Vaex
  22. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL ipyleaflet d3po Vega-Lite Vega Vaex
  23. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet d3po Vega-Lite Vega Vaex
  24. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet d3po Vega-Lite Vega graphviz Vaex graph-tool
  25. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool
  26. @jakevdp Jake VanderPlas Python’s Visualization Landscape matplotlib seaborn pandas ggpy

    scikit- plot Yellow brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool
  27. @jakevdp Jake VanderPlas In the beginning was matplotlib* * well,

    actually… Python visualization existed before matplotlib, but was not very mature.
  28. @jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like

    MatLab: switching was easy - Many rendering backends
  29. @jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like

    MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort)
  30. @jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like

    MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort) - Well-tested, standard tool for over a decade
  31. @jakevdp Jake VanderPlas import pandas as pd iris = pd.read_csv('iris.csv')

    iris.head() Tidy data: i.e. rows are samples, columns are features Example: Statistical Data
  32. @jakevdp Jake VanderPlas “I want to scatter petal length vs.

    sepal length, and color by species” Just a simple visualization . . .
  33. @jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(), ['blue', 'green', 'red'])) for

    species, group in iris.groupby('species'): plt.scatter(group['petalLength'], group['sepalLength'], color=color_map[species], alpha=0.3, edgecolor=None, label=species) plt.legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Just a simple visualization . . .
  34. @jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like

    MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot with a bit of effort - Well-tested, standard tool for over a decade Weaknesses: - API is imperative & often overly verbose - Sometimes poor stylistic defaults - Poor support for web/interactive graphs - Often slow for large & complicated data
  35. @jakevdp Jake VanderPlas Everyone’s Goal: Improve on the weaknesses of

    matplotlib (without sacrificing the strengths!)
  36. @jakevdp Jake VanderPlas Building on Matplotlib. . . matplotlib seaborn

    pandas ggpy scikit- plot Yellow brick networkx basemap /cartopy
  37. @jakevdp Jake VanderPlas Building on Matplotlib. . . Common Idea:

    Keep matplotlib as a versatile, well-tested backend, and provide a new domain-specific API. matplotlib seaborn pandas ggpy scikit- plot Yellow brick networkx basemap /cartopy
  38. @jakevdp Jake VanderPlas Building on Matplotlib. . . matplotlib seaborn

    pandas ggpy scikit- plot Yellow brick networkx basemap /cartopy
  39. @jakevdp Jake VanderPlas Pandas plotting API Key Features: - Pandas

    provides a DataFrame object - Also provides a simple API for plotting DataFrames
  40. @jakevdp Jake VanderPlas from pandas.tools.plotting import andrews_curves andrews_curves(iris, 'species') -

    More sophisticated statistical visualization tools have recently been added
  41. @jakevdp Jake VanderPlas http://seaborn.pydata.org Key Features: - Like Pandas, wraps

    matplotlib - Nice set of color palettes & plot styles - Focus on statistical visualization & modeling Seaborn: statistical data visualization
  42. @jakevdp Jake VanderPlas Javascript-based Viz: Common Idea: build a new

    API that produces a plot serialization (often JSON) that can be displayed in the browser (often in Jupyter notebooks) javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks ipyleaflet
  43. @jakevdp Jake VanderPlas Plotting with Bokeh Advantages: - Web view/interactivity

    - Imperative and Declarative layer - Handles large and/or streaming datasets - Geographical visualization - Fully open source Disadvantages: - No vector output (need PDF/EPS? Sorry) - Newer tool with a smaller user-base than matplotlib
  44. @jakevdp Jake VanderPlas Plotting with Plotly Advantages: - Web view/interactivity

    - Multi-language support - 3D plotting capability - Animation capability - Geographical visualization Disadvantages: - Some features require a paid plan
  45. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool Visualization for Larger Data . . .
  46. @jakevdp Jake VanderPlas matplotlib seaborn pandas ggpy scikit- plot Yellow

    brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool Visualization for Larger Data . . . datashader
  47. @jakevdp Jake VanderPlas Datashader - Compute layer that works with

    Bokeh - Rather than sending data to the client, it aggregates data and sends pixels. - Can handle interactive visualization of billions of rows.
  48. @jakevdp Jake VanderPlas Datashader - Compute layer that works with

    Bokeh - Rather than sending data to the client, it aggregates data and sends pixels. - Can handle interactive visualization of billions of rows.
  49. @jakevdp Jake VanderPlas seaborn pandas ggpy scikit- plot Yellow brick

    networkx basemap /cartopy pythreejs bqplot toyplot plotly ipyvolume cufflinks holoviews datashader mpld3 Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool Toward Declarative Visualization . . . d3js javascript bokeh matplotlib Altair
  50. @jakevdp Jake VanderPlas seaborn pandas ggpy scikit- plot Yellow brick

    networkx basemap /cartopy pythreejs bqplot toyplot plotly ipyvolume cufflinks holoviews mpld3 Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool Toward Declarative Visualization . . . d3js javascript bokeh matplotlib Altair datashader
  51. @jakevdp Jake VanderPlas Holoviews - Datasets themselves stored in objects

    that automatically produce intelligent visualizations - Composition & Interactivity via operator overloading - Renders to Bokeh, DataShader, and Matplotlib
  52. @jakevdp Jake VanderPlas What if instead of passing around pixels,

    we pass around visualization specifications plus data? Altair
  53. @jakevdp Jake VanderPlas What if instead of passing around pixels,

    we pass around visualization specifications plus data? “Declarative Visualization” Altair
  54. @jakevdp Jake VanderPlas What if instead of passing around pixels,

    we pass around visualization specifications plus data? “Declarative Visualization” Altair
  55. @jakevdp Jake VanderPlas Declarative Visualization: Viz for data science Declarative

    - Specify What should be done - Details determined automatically - Separates Specification from Execution Imperative - Specify How something should be done. - Must manually specify plotting steps - Specification & Execution intertwined. Declarative visualization lets you think about data and relationships, rather than incidental details.
  56. #JSM2016 Jake VanderPlas Bar Chart: d3 var margin = {top:

    20, right: 20, bottom: 30, left: 40}, width = 960 - margin.left - margin.right, height = 500 - margin.top - margin.bottom; var x = d3.scale.ordinal() .rangeRoundBands([0, width], .1); var y = d3.scale.linear() .range([height, 0]); var xAxis = d3.svg.axis() .scale(x) .orient("bottom"); var yAxis = d3.svg.axis() .scale(y) .orient("left") .ticks(10, "%"); var svg = d3.select("body").append("svg") .attr("width", width + margin.left + margin.right) .attr("height", height + margin.top + margin.bottom) .append("g") .attr("transform", "translate(" + margin.left + "," + margin.top + ")"); d3.tsv("data.tsv", type, function(error, data) { if (error) throw error; x.domain(data.map(function(d) { return d.letter; })); y.domain([0, d3.max(data, function(d) { return d.frequency; })]); svg.append("g") .attr("class", "x axis") .attr("transform", "translate(0," + height + ")") .call(xAxis); svg.append("g") .attr("class", "y axis") .call(yAxis) .append("text") .attr("transform", "rotate(-90)") .attr("y", 6) .attr("dy", ".71em") .style("text-anchor", "end") .text("Frequency"); svg.selectAll(".bar") .data(data) .enter().append("rect") .attr("class", "bar") .attr("x", function(d) { return x(d.letter); }) .attr("width", x.rangeBand()) .attr("y", function(d) { return y(d.frequency); }) .attr("height", function(d) { return height - y(d.frequency); }); }); function type(d) { d.frequency = +d.frequency; return d; } D3 is a Javascript package that streamlines manipulation of objects on a webpage.
  57. #JSM2016 Jake VanderPlas Bar Chart: Vega { "width": 400, "height":

    200, "padding": {"top": 10, "left": 30, "bottom": 30, "right": 10}, "data": [ { "name": "table", "values": [ {"x": 1, "y": 28}, {"x": 2, "y": 55}, {"x": 3, "y": 43}, {"x": 4, "y": 91}, {"x": 5, "y": 81}, {"x": 6, "y": 53}, {"x": 7, "y": 19}, {"x": 8, "y": 87}, {"x": 9, "y": 52}, {"x": 10, "y": 48}, {"x": 11, "y": 24}, {"x": 12, "y": 49}, {"x": 13, "y": 87}, {"x": 14, "y": 66}, {"x": 15, "y": 17}, {"x": 16, "y": 27}, {"x": 17, "y": 68}, {"x": 18, "y": 16}, {"x": 19, "y": 49}, {"x": 20, "y": 15} ] } ], "scales": [ { "name": "x", "type": "ordinal", "range": "width", "domain": {"data": "table", "field": "x"} }, { "name": "y", "type": "linear", "range": "height", "domain": {"data": "table", "field": "y"}, "nice": true } ], "axes": [ {"type": "x", "scale": "x"}, {"type": "y", "scale": "y"} ], "marks": [ { "type": "rect", "from": {"data": "table"}, "properties": { "enter": { "x": {"scale": "x", "field": "x"}, "width": {"scale": "x", "band": true, "offset": -1}, "y": {"scale": "y", "field": "y"}, "y2": {"scale": "y", "value": 0} }, "update": { "fill": {"value": "steelblue"} Vega is a detailed declarative specification for visualizations, built on D3.
  58. #JSM2016 Jake VanderPlas Bar Chart: Vega-Lite { "description": "A simple

    bar chart with embedded data.", "data": { "values": [ {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43}, {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53}, {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52} ] }, "mark": "bar", "encoding": { "x": {"field": "a", "type": "ordinal"}, "y": {"field": "b", "type": "quantitative"} } } Vega-Lite is a simpler declarative specification aimed at statistical visualization.
  59. #JSM2016 Jake VanderPlas Bar Chart: Altair Altair is a Python

    API for creating Vega-Lite specifications.
  60. @jakevdp Jake VanderPlas From Declarative API to declarative Grammar chart

    = Chart(data).mark_circle( opacity=0.3 ).encode( x='petalLength:Q', y='sepalWidth:Q', color='species:N', ) chart.display()
  61. @jakevdp Jake VanderPlas From Declarative API to declarative Grammar >>>

    chart.to_dict() {'config': {'mark': {'opacity': 0.3}}, 'data': {'url': 'https://vega.github.io/vega-datasets/data/iris.json'}, 'encoding': {'color': {'field': 'species', 'type': 'nominal'}, 'x': {'field': 'petalLength', 'type': 'quantitative'}, 'y': {'field': 'sepalWidth', 'type': 'quantitative'}}, 'mark': 'circle'}
  62. @jakevdp Jake VanderPlas or $ conda install altair --channel conda-forge

    $ pip install altair $ jupyter nbextension install --sys-prefix --py vega Try Altair: http://github.com/ellisonbg/altair/ For a Jupyter notebook tutorial, type import altair altair.tutorial()
  63. @jakevdp Jake VanderPlas Python’s Visualization Landscape matplotlib seaborn pandas ggpy

    scikit- plot Yellow brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool
  64. @jakevdp Jake VanderPlas Email: [email protected] Twitter: @jakevdp Github: jakevdp Web:

    http://vanderplas.com/ Blog: http://jakevdp.github.io/ Thank You!