$30 off During Our Annual Pro Sale. View Details »

Altair Tutorial Intro - PyCon 2018

Altair Tutorial Intro - PyCon 2018

The intro slides to my tutorial on Altair and Vega-Lite from PyCon 2018.

Full materials and video link available at https://github.com/altair-viz/altair-tutorial

Jake VanderPlas

May 12, 2018
Tweet

More Decks by Jake VanderPlas

Other Decks in Technology

Transcript

  1. @jakevdp
    Jake VanderPlas
    Jake VanderPlas @jakevdp
    PyCon 2018
    Exploratory Data
    Visualization
    with Altair
    Materials at http://github.com/altair-viz/altair-tutorial

    View Slide

  2. Building Blocks of Visualization:
    1. Data
    2. Transformation
    3. Marks
    4. Encoding – mapping from
    fields to mark properties
    5. Scale – functions that map data
    to visual scales
    6. Guides – visualization of scales
    (axes, legends, etc.)

    View Slide

  3. Key: Visualization concepts should map
    directly to visualization implementation.

    View Slide

  4. Hypothesis: good implementation can
    influence good conceptualization.

    View Slide

  5. @jakevdp
    Jake VanderPlas
    http://matplotlib.org/
    ~ familiar tools ~

    View Slide

  6. @jakevdp
    Jake VanderPlas
    import matplotlib.pyplot as plt
    import numpy as np
    x = np.random.randn(1000)
    y = np.random.randn(1000)
    color = np.arange(1000)
    plt.scatter(x, y, c=color)
    plt.colorbar()
    Plotting with Matplotlib

    View Slide

  7. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    For more on the historical perspective, see
    https://speakerdeck.com/jakevdp/pydata-101

    View Slide

  8. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends

    View Slide

  9. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends
    - Can reproduce just about any plot (with a bit of effort)

    View Slide

  10. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends
    - Can reproduce just about any plot (with a bit of effort)
    - Well-tested, standard tool for 15 years

    View Slide

  11. @jakevdp
    Jake VanderPlas
    Matplotlib Gallery

    View Slide

  12. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Strengths:
    - Designed like MatLab: switching was easy
    - Many rendering backends
    - Can reproduce just about any plot with a bit of
    effort
    - Well-tested, standard tool for 15 years
    Weaknesses:
    - API is imperative & often overly verbose
    - Poor/no support for interactive/web graphs

    View Slide

  13. @jakevdp
    Jake VanderPlas
    import matplotlib.pyplot as plt
    import numpy as np
    x = np.random.randn(1000)
    y = np.random.randn(1000)
    color = np.arange(1000)
    plt.scatter(x, y, c=color)
    plt.colorbar()
    Plotting with Matplotlib

    View Slide

  14. @jakevdp
    Jake VanderPlas
    from vega_datsets import data
    iris = data('iris')
    iris.head()
    Data in column-oriented format; i.e. rows are samples,
    columns are features
    Statistical Visualization

    View Slide

  15. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),
    ['blue', 'green', 'red']))
    for species, group in iris.groupby('species'):
    plt.scatter(group['petalLength'], group['sepalWidth'],
    color=color_map[species],
    alpha=0.3, edgecolor=None,
    label=species)
    plt.legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Grouping

    View Slide

  16. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),
    ['blue', 'green', 'red']))
    for species, group in iris.groupby('species'):
    plt.scatter(group['petalLength'], group['sepalWidth'],
    color=color_map[species],
    alpha=0.3, edgecolor=None,
    label=species)
    plt.legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Grouping
    1. Data?
    2. Transformation?
    3. Marks?
    4. Encoding?
    5. Scale?
    6. Guides?

    View Slide

  17. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
    n_panels = len(color_map)
    fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
    sharex
    =True, sharey=True)
    for i, (species, group) in enumerate(iris.groupby('species')):
    ax[i].scatter(group['petalLength'], group['sepalWidth'],
    color
    =color_map[species],
    alpha
    =0.3, edgecolor=None,
    label
    =species)
    ax[i].legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Faceting

    View Slide

  18. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
    n_panels = len(color_map)
    fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
    sharex
    =True, sharey=True)
    for i, (species, group) in enumerate(iris.groupby('species')):
    ax[i].scatter(group['petalLength'], group['sepalWidth'],
    color
    =color_map[species],
    alpha
    =0.3, edgecolor=None,
    label
    =species)
    ax[i].legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Faceting
    Problem:
    We’re mixing the what with the how

    View Slide

  19. @jakevdp
    Jake VanderPlas
    Toward a well-motivated
    Declarative Visualization
    Declarative
    - Specify What should be
    done.
    - Separates Specification
    from Execution
    - “Map to a position,
    and to a color”
    Imperative
    - Specify How something
    should be done.
    - Specification &
    Execution intertwined.
    - “Put a red circle here
    and a blue circle here”
    Declarative visualization lets you think about data
    and relationships, rather than incidental details.

    View Slide

  20. @jakevdp
    Jake VanderPlas
    Toward a well-motivated
    Declarative Visualization
    Declarative
    - Specify What should be
    done.
    - Separates Specification
    from Execution
    - “Map to a position,
    and to a color”
    Imperative
    - Specify How something
    should be done.
    - Specification &
    Execution intertwined.
    - “Put a red circle here
    and a blue circle here”
    Declarative visualization lets you think about data
    and relationships, rather than incidental details.

    View Slide

  21. Altair
    Declarative Visualization in Python
    http://altair-viz.github.io
    Based on the Vega and Vega-Lite grammars.

    View Slide

  22. @jakevdp
    Jake VanderPlas
    Altair for Statistical Visualization
    import altair as alt
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength',
    y='sepalWidth',
    color='species'
    )

    View Slide

  23. @jakevdp
    Jake VanderPlas
    Encodings are Flexible:
    import altair as alt
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength',
    y='sepalWidth',
    color='species',
    column='species'
    )

    View Slide

  24. @jakevdp
    Jake VanderPlas
    Altair is Interactive
    import altair as alt
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength',
    y='sepalWidth',
    color='species'
    ).interactive()

    View Slide

  25. @jakevdp
    Jake VanderPlas
    And so much more . . .

    View Slide

  26. @jakevdp
    Jake VanderPlas
    See the rest of the tutorial content at
    http://github.com/altair-viz/altair-tutorial

    View Slide

  27. @jakevdp
    Jake VanderPlas

    View Slide

  28. @jakevdp
    Jake VanderPlas
    Extra Content

    View Slide

  29. @jakevdp
    Jake VanderPlas
    Basics of an Altair Chart
    import altair as alt
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N'
    )

    View Slide

  30. @jakevdp
    Jake VanderPlas
    import altair as Chart
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_circle().encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N'
    )
    Anatomy of an Altair Chart
    iris = data.iris()
    alt.Chart(iris)
    Chart assumes tabular,
    column-oriented data
    Supports pandas dataframes,
    or CSV/TSV/JSON URLs

    View Slide

  31. @jakevdp
    Jake VanderPlas
    import altair as Chart
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N'
    )
    Anatomy of an Altair Chart
    mark_point()
    Chart uses one of several
    pre-defined marks:
    - point
    - line
    - bar
    - area
    - rect
    - geoshape
    - text
    - circle
    - square
    - rule
    - tick

    View Slide

  32. @jakevdp
    Jake VanderPlas
    import altair as Chart
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N'
    )
    Basics of an Altair Chart
    - Encodings map visual channels to data columns,
    - Channels are automatically adjusted based on
    data type (N, O, Q, T)
    Available channels:
    - Position (x, y)
    - Facet (row, column)
    - color
    - shape
    - size
    - text
    - opacity
    - stroke
    - fill
    - latitude/longitude
    encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N'

    View Slide

  33. @jakevdp
    Jake VanderPlas
    Anatomy of an Altair Chart
    { "data": {"values": [...]},
    "encoding": {
    "color": {"field": "species", "type": "nominal"},
    "x": {"field": "petalLength", "type": "quantitative"},
    "y": {"field": "sepalWidth", "type": "quantitative"}
    },
    "mark": "point"
    }
    import altair as alt
    from vega_datasets import data
    iris = data.iris()
    alt.Chart(iris).mark_point().encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N'
    ).to_json()
    Altair produces
    specifications
    following the
    Vega-Lite
    grammar.
    http://vega.github.io/vega-lite/

    View Slide

  34. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  35. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  36. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  37. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  38. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  39. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  40. @jakevdp
    Jake VanderPlas
    Examples:

    View Slide

  41. Jake VanderPlas
    (Visualizations from
    jakevdp/altair-examples).

    View Slide

  42. Jake VanderPlas
    Altair 2.0: a Grammar of Interaction

    View Slide

  43. @jakevdp
    Jake VanderPlas

    View Slide

  44. @jakevdp
    Jake VanderPlas
    ~ From D3 to Vega to Altair ~

    View Slide

  45. Jake VanderPlas
    So what is Vega-Lite?

    View Slide

  46. Jake VanderPlas
    D3 is Everywhere . . .
    (live version at NYT)

    View Slide

  47. Jake VanderPlas
    But working in D3 can
    be challenging . . .

    View Slide

  48. Jake VanderPlas
    Bar Chart: d3
    var margin = {top: 20, right: 20, bottom: 30, left: 40},
    width = 960 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    var x = d3.scale.ordinal()
    .rangeRoundBands([0, width], .1);
    var y = d3.scale.linear()
    .range([height, 0]);
    var xAxis = d3.svg.axis()
    .scale(x)
    .orient("bottom");
    var yAxis = d3.svg.axis()
    .scale(y)
    .orient("left")
    .ticks(10, "%");
    var svg = d3.select("body").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
    d3.tsv("data.tsv", type, function(error, data) {
    if (error) throw error;
    x.domain(data.map(function(d) { return d.letter; }));
    y.domain([0, d3.max(data, function(d) { return d.frequency; })]);
    svg.append("g")
    .attr("class", "x axis")
    .attr("transform", "translate(0," + height + ")")
    .call(xAxis);
    svg.append("g")
    .attr("class", "y axis")
    .call(yAxis)
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 6)
    .attr("dy", ".71em")
    .style("text-anchor", "end")
    .text("Frequency");
    svg.selectAll(".bar")
    .data(data)
    .enter().append("rect")
    .attr("class", "bar")
    .attr("x", function(d) { return x(d.letter); })
    .attr("width", x.rangeBand())
    .attr("y", function(d) { return y(d.frequency); })
    .attr("height", function(d) { return height - y(d.frequency); });
    });
    function type(d) {
    d.frequency = +d.frequency;
    return d;
    }
    D3 is a Javascript package that
    streamlines manipulation of
    objects on a webpage.

    View Slide

  49. Jake VanderPlas
    Bar Chart: Vega
    {
    "width": 400,
    "height": 200,
    "padding": {"top": 10, "left": 30, "bottom": 30, "right": 10},
    "data": [
    {
    "name": "table",
    "values": [
    {"x": 1, "y": 28}, {"x": 2, "y": 55},
    {"x": 3, "y": 43}, {"x": 4, "y": 91},
    {"x": 5, "y": 81}, {"x": 6, "y": 53},
    {"x": 7, "y": 19}, {"x": 8, "y": 87},
    {"x": 9, "y": 52}, {"x": 10, "y": 48},
    {"x": 11, "y": 24}, {"x": 12, "y": 49},
    {"x": 13, "y": 87}, {"x": 14, "y": 66},
    {"x": 15, "y": 17}, {"x": 16, "y": 27},
    {"x": 17, "y": 68}, {"x": 18, "y": 16},
    {"x": 19, "y": 49}, {"x": 20, "y": 15}
    ]
    }
    ],
    "scales": [
    {
    "name": "x",
    "type": "ordinal",
    "range": "width",
    "domain": {"data": "table", "field": "x"}
    },
    {
    "name": "y",
    "type": "linear",
    "range": "height",
    "domain": {"data": "table", "field": "y"},
    "nice": true
    }
    ],
    "axes": [
    {"type": "x", "scale": "x"},
    {"type": "y", "scale": "y"}
    ],
    "marks": [
    {
    "type": "rect",
    "from": {"data": "table"},
    "properties": {
    "enter": {
    "x": {"scale": "x", "field": "x"},
    "width": {"scale": "x", "band": true, "offset": -1},
    "y": {"scale": "y", "field": "y"},
    "y2": {"scale": "y", "value": 0}
    },
    "update": {
    "fill": {"value": "steelblue"}
    Vega is a detailed declarative
    specification for visualizations,
    built on D3.

    View Slide

  50. Jake VanderPlas
    Bar Chart: Vega-Lite
    {
    "description": "A simple bar chart with embedded data.",
    "data": {
    "values": [
    {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
    {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
    {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
    },
    "mark": "bar",
    "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
    }
    }
    Vega-Lite is a simpler
    declarative specification aimed
    at statistical visualization.

    View Slide

  51. Jake VanderPlas
    Bar Chart: Altair
    Altair is a Python API for creating
    Vega-Lite specifications.

    View Slide

  52. Jake VanderPlas

    View Slide

  53. Jake VanderPlas
    ~ Thinking about Visualization ~

    View Slide

  54. Bertin’s Semiology of Graphics (1967)

    View Slide

  55. 2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)

    View Slide

  56. Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)

    View Slide

  57. Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)

    View Slide

  58. Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)
    Order & Quantity

    View Slide

  59. Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)
    Order & Quantity (less so)

    View Slide

  60. Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)
    Order... Quantity?

    View Slide

  61. Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)
    Order, Quantity

    View Slide

  62. Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)
    Order, Quantity

    View Slide

  63. Suitable for
    ordered data
    (also length, area,
    volume, etc.)
    Suitable for
    unordered data
    (Also transparancy,
    blur/focus, etc.)
    2D
    Position
    Size
    Color
    Value
    Texture
    Color Hue
    Angle
    Shape
    Bertin’s Semiology of Graphics (1967)
    Bertin’s “Levels of Organization”
    Position N O Q
    Size N O Q
    Color Value N O Q
    Texture N O
    Color Hue N
    Angle N
    Shape N
    N = Nominal (named category)
    O = Ordinal (ordered category)
    Q = Quantitative (ordered continuous)

    View Slide

  64. Key: Visualization concepts should map
    directly to visualization implementation.
    Great resource is Jeff Heer’s viz course: https://courses.cs.washington.edu/courses/cse512/16sp/

    View Slide