$30 off During Our Annual Pro Sale. View Details »

Bespoke Visualizations with a Declarative Twist

Bespoke Visualizations with a Declarative Twist

Meetup talk at Convoy in Seattle, Feb 15, 2018. Slides cover visualization in Python, and motivate the use of Vega, Vega-Lite, and Altair.

Jake VanderPlas

February 15, 2018
Tweet

More Decks by Jake VanderPlas

Other Decks in Programming

Transcript

  1. @jakevdp
    Jake VanderPlas
    Jake VanderPlas @jakevdp
    Convoy Tech
    Feb 15, 2018
    Bespoke Visualizations
    with a Declarative Twist

    View Slide

  2. @jakevdp
    Jake VanderPlas
    Python Viz is a bit Painful...
    "I have been using Matplotlib for a decade
    now, and I still have to look most things up"
    “I love Python but I switch to R for
    making plots”
    “I do viz in Python, but switch from
    matplotlib to seaborn to bokeh
    depending on what I need to do”

    View Slide

  3. @jakevdp
    Jake VanderPlas
    Python’s Visualization
    Landscape
    matplotlib
    seaborn
    pandas
    ggpy
    scikit-
    plot
    Yellow
    brick
    networkx
    basemap
    /cartopy
    javascript
    pythreejs
    bqplot
    bokeh
    toyplot
    plotly
    ipyvolume
    cufflinks
    holoviews
    datashader
    d3js
    mpld3
    Altair
    Vincent
    OpenGL
    Glumpy
    Vispy
    ipyleaflet
    Lightning
    GlueViz
    YT
    d3po
    Vega-Lite
    Vega
    MayaVi
    graphviz
    GR
    framework
    PyQTgraph
    pygal chaco
    Vaex
    graph-tool

    View Slide

  4. @jakevdp
    Jake VanderPlas
    Problem: where would you tell
    beginners to start?
    - Matplotlib
    - Bokeh
    - Plotly
    - Seaborn
    - Holoviews
    - VisPy
    - ggplot
    - pandas plot
    - Lightning
    Each library has strengths, but
    arguably none is yet the “killer
    viz app” for Data Science.

    View Slide

  5. @jakevdp
    Jake VanderPlas
    Some examples . . .

    View Slide

  6. @jakevdp
    Jake VanderPlas
    http://matplotlib.org/

    View Slide

  7. @jakevdp
    Jake VanderPlas
    import matplotlib.pyplot as plt
    from numpy.random import rand
    for color in ['red', 'green', 'blue']:
    x, y = rand(2, 100)
    size = 200.0 * rand(100)
    plt.scatter(x, y, c=color, s=size, label=color,
    alpha=0.3, edgecolor='none')
    plt.legend(frameon=True)
    plt.show()
    Plotting with Matplotlib

    View Slide

  8. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy

    View Slide

  9. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends

    View Slide

  10. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends
    - Can reproduce just about any plot (with a bit of effort)

    View Slide

  11. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends
    - Can reproduce just about any plot (with a bit of effort)
    - Well-tested, standard tool for over a decade

    View Slide

  12. @jakevdp
    Jake VanderPlas
    Matplotlib Gallery

    View Slide

  13. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends
    - Can reproduce just about any plot with a bit of
    effort
    - Well-tested, standard tool for over a decade
    Weaknesses:
    - API is imperative & often overly verbose
    - Sometimes poor stylistic defaults
    - Poor support for web/interactive graphs
    - Often slow for large & complicated data

    View Slide

  14. @jakevdp
    Jake VanderPlas
    http://bokeh.pydata.org/

    View Slide

  15. @jakevdp
    Jake VanderPlas
    from bokeh.plotting import figure, show
    from bokeh.models import LinearAxis, Range1d
    p = figure()
    for color in ['red', 'green', 'blue']:
    x, y = rand(2, 100)
    size = 0.03 * rand(100)
    p.circle(x, y, fill_color=color, radius=size,
    legend=color, fill_alpha=0.3,
    line_color=None)
    show(p)
    Plotting with Bokeh

    View Slide

  16. @jakevdp
    Jake VanderPlas
    Plotting with Bokeh

    View Slide

  17. @jakevdp
    Jake VanderPlas
    Bokeh Gallery

    View Slide

  18. @jakevdp
    Jake VanderPlas
    Plotting with Bokeh
    Advantages:
    - Web view/interactivity
    - Imperative and Declarative layer
    - Handles large and/or streaming datasets
    - Geographical visualization
    - Fully open source
    Disadvantages:
    - No vector output (need PDF/EPS? Sorry)
    - Newer tool with a smaller user-base than
    matplotlib

    View Slide

  19. @jakevdp
    Jake VanderPlas
    http://plot.ly/

    View Slide

  20. @jakevdp
    Jake VanderPlas
    Basic Plotting with Plotly

    View Slide

  21. @jakevdp
    Jake VanderPlas
    Plotly Gallery

    View Slide

  22. @jakevdp
    Jake VanderPlas
    Plotting with Plotly
    Advantages:
    - Web view/interactivity
    - Multi-language support
    - 3D plotting capability
    - Animation capability
    - Geographical visualization
    Disadvantages:
    - Some features require a paid plan

    View Slide

  23. @jakevdp
    Jake VanderPlas
    Moving to Statistical
    Visualization

    View Slide

  24. @jakevdp
    Jake VanderPlas
    from altair import load_dataset
    iris = load_dataset('iris')
    iris.head()
    Data in Tidy Format: i.e. rows are samples, columns are
    features
    Statistical Visualization

    View Slide

  25. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),
    ['blue', 'green', 'red']))
    for species, group in iris.groupby('species'):
    plt.scatter(group['petalLength'], group['sepalWidth'],
    color=color_map[species],
    alpha=0.3, edgecolor=None,
    label=species)
    plt.legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Grouping

    View Slide

  26. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
    n_panels = len(color_map)
    fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
    sharex
    =True, sharey=True)
    for i, (species, group) in enumerate(iris.groupby('species')):
    ax[i].scatter(group['petalLength'], group['sepalWidth'],
    color
    =color_map[species],
    alpha
    =0.3, edgecolor=None,
    label
    =species)
    ax[i].legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Faceting

    View Slide

  27. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
    n_panels = len(color_map)
    fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
    sharex
    =True, sharey=True)
    for i, (species, group) in enumerate(iris.groupby('species')):
    ax[i].scatter(group['petalLength'], group['sepalWidth'],
    color
    =color_map[species],
    alpha
    =0.3, edgecolor=None,
    label
    =species)
    ax[i].legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Faceting
    Problem:
    We’re mixing the what with the how

    View Slide

  28. @jakevdp
    Jake VanderPlas
    Most Useful for Data Science is
    Declarative Visualization
    Declarative
    - Specify What should be
    done
    - Details determined
    automatically
    - Separates Specification
    from Execution
    Imperative
    - Specify How something
    should be done.
    - Must manually specify
    plotting steps
    - Specification &
    Execution intertwined.
    Declarative visualization lets you think about data
    and relationships, rather than incidental details.

    View Slide

  29. @jakevdp
    Jake VanderPlas
    Seaborn: Declarative Visualization
    . . . Almost
    import seaborn as sns
    g = sns.FacetGrid(iris, col="species", hue="species")
    g.map(plt.scatter, "petalLength", "sepalWidth", alpha=0.3)
    g.add_legend();

    View Slide

  30. @jakevdp
    Jake VanderPlas
    http://altair-viz.github.io/

    View Slide

  31. @jakevdp
    Jake VanderPlas
    Altair for Declarative Visualization
    from altair import Chart
    from vega_datasets import data
    iris = data.iris()
    Chart(iris).mark_circle().encode(
    x='petalLength',
    y='sepalWidth',
    color='species'
    )

    View Slide

  32. @jakevdp
    Jake VanderPlas
    Altair for Declarative Visualization
    from altair import Chart
    from vega_datasets import data
    iris = data.iris()
    Chart(iris).mark_circle().encode(
    x='petalLength',
    y='sepalWidth',
    color='species'
    ).interactive()

    View Slide

  33. @jakevdp
    Jake VanderPlas
    Encodings are Flexible:
    from altair import Chart
    from vega_datasets import data
    iris = data.iris()
    Chart(iris).mark_circle().encode(
    x='petalLength',
    y='sepalWidth',
    color='species',
    column='species'
    )

    View Slide

  34. @jakevdp
    Jake VanderPlas
    Altair.
    Declarative statistical visualization library for Python,
    driven by Vega-Lite
    http://github.com/altair-viz/altair
    Collaboration with Brian Granger (Jupyter team), myself,
    and UW’s Interactive Data Lab

    View Slide

  35. Jake VanderPlas
    So What Is Altair?

    View Slide

  36. Jake VanderPlas
    D3 is Everywhere . . .
    (live version at NYT)

    View Slide

  37. Jake VanderPlas
    But working in D3 can
    be challenging . . .

    View Slide

  38. Jake VanderPlas
    Bar Chart: d3
    var margin = {top: 20, right: 20, bottom: 30, left: 40},
    width = 960 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    var x = d3.scale.ordinal()
    .rangeRoundBands([0, width], .1);
    var y = d3.scale.linear()
    .range([height, 0]);
    var xAxis = d3.svg.axis()
    .scale(x)
    .orient("bottom");
    var yAxis = d3.svg.axis()
    .scale(y)
    .orient("left")
    .ticks(10, "%");
    var svg = d3.select("body").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
    d3.tsv("data.tsv", type, function(error, data) {
    if (error) throw error;
    x.domain(data.map(function(d) { return d.letter; }));
    y.domain([0, d3.max(data, function(d) { return d.frequency; })]);
    svg.append("g")
    .attr("class", "x axis")
    .attr("transform", "translate(0," + height + ")")
    .call(xAxis);
    svg.append("g")
    .attr("class", "y axis")
    .call(yAxis)
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 6)
    .attr("dy", ".71em")
    .style("text-anchor", "end")
    .text("Frequency");
    svg.selectAll(".bar")
    .data(data)
    .enter().append("rect")
    .attr("class", "bar")
    .attr("x", function(d) { return x(d.letter); })
    .attr("width", x.rangeBand())
    .attr("y", function(d) { return y(d.frequency); })
    .attr("height", function(d) { return height - y(d.frequency); });
    });
    function type(d) {
    d.frequency = +d.frequency;
    return d;
    }
    D3 is a Javascript package that
    streamlines manipulation of
    objects on a webpage.

    View Slide

  39. Jake VanderPlas
    Bar Chart: Vega
    {
    "width": 400,
    "height": 200,
    "padding": {"top": 10, "left": 30, "bottom": 30, "right": 10},
    "data": [
    {
    "name": "table",
    "values": [
    {"x": 1, "y": 28}, {"x": 2, "y": 55},
    {"x": 3, "y": 43}, {"x": 4, "y": 91},
    {"x": 5, "y": 81}, {"x": 6, "y": 53},
    {"x": 7, "y": 19}, {"x": 8, "y": 87},
    {"x": 9, "y": 52}, {"x": 10, "y": 48},
    {"x": 11, "y": 24}, {"x": 12, "y": 49},
    {"x": 13, "y": 87}, {"x": 14, "y": 66},
    {"x": 15, "y": 17}, {"x": 16, "y": 27},
    {"x": 17, "y": 68}, {"x": 18, "y": 16},
    {"x": 19, "y": 49}, {"x": 20, "y": 15}
    ]
    }
    ],
    "scales": [
    {
    "name": "x",
    "type": "ordinal",
    "range": "width",
    "domain": {"data": "table", "field": "x"}
    },
    {
    "name": "y",
    "type": "linear",
    "range": "height",
    "domain": {"data": "table", "field": "y"},
    "nice": true
    }
    ],
    "axes": [
    {"type": "x", "scale": "x"},
    {"type": "y", "scale": "y"}
    ],
    "marks": [
    {
    "type": "rect",
    "from": {"data": "table"},
    "properties": {
    "enter": {
    "x": {"scale": "x", "field": "x"},
    "width": {"scale": "x", "band": true, "offset": -1},
    "y": {"scale": "y", "field": "y"},
    "y2": {"scale": "y", "value": 0}
    },
    "update": {
    "fill": {"value": "steelblue"}
    Vega is a detailed declarative
    specification for visualizations,
    built on D3.

    View Slide

  40. Jake VanderPlas
    Bar Chart: Vega-Lite
    {
    "description": "A simple bar chart with embedded data.",
    "data": {
    "values": [
    {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
    {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
    {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
    },
    "mark": "bar",
    "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
    }
    }
    Vega-Lite is a simpler
    declarative specification aimed
    at statistical visualization.

    View Slide

  41. Jake VanderPlas
    Bar Chart: Altair
    Altair is a Python API for creating
    Vega-Lite specifications.

    View Slide

  42. @jakevdp
    Jake VanderPlas
    From Declarative API
    to declarative Grammar
    url = load_dataset('iris', url_only=True)
    chart = Chart(url).mark_circle(
    opacity=0.3
    ).encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N',
    )
    chart.display()

    View Slide

  43. @jakevdp
    Jake VanderPlas
    From Declarative API
    to declarative Grammar
    >>> chart.to_dict()
    {'config': {'mark': {'opacity': 0.3}},
    'data':
    {'url': 'https://vega.github.io/vega-datasets/data/iris.json'},
    'encoding': {'color': {'field': 'species', 'type': 'nominal'},
    'x': {'field': 'petalLength', 'type': 'quantitative'},
    'y': {'field': 'sepalWidth', 'type': 'quantitative'}},
    'mark': 'circle'}

    View Slide

  44. Jake VanderPlas
    Key Features of Altair:
    - Designed with Statistical Visualizations in mind
    - Data specified in Tidy Format & linked to a
    declared type: Quantitative, Nominal, Ordinal,
    Temporal
    - Well-defined set of marks to represent data
    - Encoding Channels map
    data features (i.e. columns) to
    visual encodings (e.g. x, y, color, size, etc.)
    - Simple data transformations supported
    natively

    View Slide

  45. Jake VanderPlas
    But why another plotting library?
    Teaching: students can learn
    visualization concepts with minimal
    syntactic distraction.
    Publishing: Instead of publishing
    pixels, can publish data + plot
    specification for greater flexibility &
    reproducibility.
    Cross-Pollination: Vega-Lite has the
    potential to provide a cross-platform
    lingua franca of statistical visualization.
    - Matplotlib
    - Bokeh
    - Plotly
    - Seaborn
    - Holoviews
    - VisPy
    - ggplot
    - pandas plot
    - Lightning

    View Slide

  46. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  47. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  48. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  49. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  50. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  51. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  52. Jake VanderPlas
    (Visualizations from
    jakevdp/altair-examples).

    View Slide

  53. Jake VanderPlas
    Altair 2.0: a Grammar of Interaction

    View Slide

  54. @jakevdp
    Jake VanderPlas
    Some Live Examples . . .
    See the notebook at
    https://github.com/jakevdp/talks/blob/master/2016-11-9-Altair.ipynb

    View Slide

  55. @jakevdp
    Jake VanderPlas
    or
    $ conda install altair --channel conda-forge
    $ pip install altair
    $ jupyter nbextension install --sys-prefix --py vega
    Try Altair:
    http://github.com/ellisonbg/altair/
    For a Jupyter notebook tutorial, type
    import altair
    altair.tutorial()

    View Slide

  56. @jakevdp
    Jake VanderPlas
    Altair’s Development is Active!
    - More plot types
    - Higher-level Statistical routines
    - Improve layering API
    - Vega-Tooltip interaction
    - Vega-Lite's Grammar of Interaction

    View Slide

  57. @jakevdp
    Jake VanderPlas
    Email: [email protected]
    Twitter: @jakevdp
    Github: jakevdp
    Web: http://vanderplas.com
    Blog: http://jakevdp.github.io
    Thank You!

    View Slide