Upgrade to Pro — share decks privately, control downloads, hide ads and more …

purrr workshop

purrr workshop

Slides to complement a hands-on workshop on the R package purrr (https://purrr.tidyverse.org)

Jennifer (Jenny) Bryan

September 03, 2018
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1. Jennifer Bryan 

    RStudio
     @JennyBryan  @jennybc
    How to repeat
    yourself with purrr

    View full-size slide

  2. This work is licensed under a
    Creative Commons
    Attribution-ShareAlike 4.0
    International License.
    To view a copy of this license, visit 

    http://creativecommons.org/licenses/by-sa/4.0/

    View full-size slide

  3. R installed? Pretty recent?
    • Current version: 3.5.1
    RStudio installed?
    • Current Preview: 1.2.907
    Have these packages?
    • tidyverse (includes purrr)
    • repurrrsive
    Get some help NOW
    if you need/want to
    do some setup
    during the intro!

    View full-size slide

  4. rstd.io/purrr-latinr

    View full-size slide

  5. bit.ly/jenny-live-code

    View full-size slide

  6. Resources
    My purrr materials:
    https://jennybc.github.io/purrr-tutorial/
    Charlotte Wickham's purrr materials:
    https://github.com/cwickham/purrr-tutorial
    My "row-oriented workflows" materials:
    rstd.io/row-work
    "Functionals" chapter of 2nd of Advanced R by Wickham
    https://adv-r.hadley.nz/functionals.html

    View full-size slide

  7. 1. What is the harm with copy/paste and
    repetitive code?
    2. What should I do instead?
    - write functions (R-Ladies Thursday)
    - use formal tools to iterate the R way
    3. Hands-on practice with the purrr
    package for iteration

    View full-size slide

  8. library(gapminder)
    library(tidyverse)
    gapminder
    #> # A tibble: 1,704 x 6
    #> country continent year lifeExp pop gdpPercap
    #>
    #> 1 Afghanistan Asia 1952 28.8 8425333 779.
    #> 2 Afghanistan Asia 1957 30.3 9240934 821.
    #> 3 Afghanistan Asia 1962 32.0 10267083 853.
    #> 4 Afghanistan Asia 1967 34.0 11537966 836.
    #> 5 Afghanistan Asia 1972 36.1 13079460 740.
    #> 6 Afghanistan Asia 1977 38.4 14880372 786.
    #> 7 Afghanistan Asia 1982 39.9 12881816 978.
    #> 8 Afghanistan Asia 1987 40.8 13867957 852.
    #> 9 Afghanistan Asia 1992 41.7 16317921 649.
    #> 10 Afghanistan Asia 1997 41.8 22227415 635.
    #> # ... with 1,694 more rows

    View full-size slide

  9. gapminder %>%
    count(continent)
    #> # A tibble: 5 x 2
    #> continent n
    #>
    #> 1 Africa 624
    #> 2 Americas 300
    #> 3 Asia 396
    #> 4 Europe 360
    #> 5 Oceania 24

    View full-size slide

  10. africa <- gapminder[gapminder$continent == "Africa", ]
    africa_mm <- max(africa$lifeExp) - min(africa$lifeExp)
    americas <- gapminder[gapminder$continent == "Americas", ]
    americas_mm <- max(americas$lifeExp) - min(americas$lifeExp)
    asia <- gapminder[gapminder$continent == "Asia", ]
    asia_mm <- max(asia$lifeExp) - min(africa$lifeExp)
    europe <- gapminder[gapminder$continent == "Europe", ]
    europe_mm <- max(europe$lifeExp) - min(europe$lifeExp)
    oceania <- gapminder[gapminder$continent == "Oceania", ]
    oceania_mm <- max(europe$lifeExp) - min(oceania$lifeExp)
    cbind(
    continent = c("Africa", "Asias", "Europe", "Oceania"),
    max_minus_min = c(africa_mm, americas_mm, asia_mm,
    europe_mm, oceania_mm)
    )

    View full-size slide

  11. What am I trying to do?
    Have I even done it?*
    * Can you find my mistakes?

    View full-size slide

  12. How would you compute this?
    for each continent
    max life exp - min life exp
    put result in a data frame

    View full-size slide

  13. gapminder %>%
    group_by(continent) %>%
    summarize(max_minus_min = max(lifeExp) - min(lifeExp))
    #> # A tibble: 5 x 2
    #> continent max_minus_min
    #>
    #> 1 Africa 52.8
    #> 2 Americas 43.1
    #> 3 Asia 53.8
    #> 4 Europe 38.2
    #> 5 Oceania 12.1
    Here's how I would do it.
    Conclusion: there are many ways to
    write a for loop in R!

    View full-size slide

  14. sidebar on %>%

    View full-size slide

  15. child <- c("Reed", "Wesley", "Eli", "Toby")
    age <- c( 14, 12, 12, 1)
    s <- rep_len("", length(child))
    for (i in seq_along(s)) {
    s[i] <- paste(child[i], "is", age[i], "years old")
    }
    s
    #> [1] "Reed is 14 years old" "Wesley is 12 years old"
    #> [3] "Eli is 12 years old" "Toby is 1 years old"
    New example: making strings

    View full-size slide

  16. child <- c("Reed", "Wesley", "Eli", "Toby")
    age <- c( 14, 12, 12, 1)
    paste(child, "is", age, "years old")
    #> [1] "Reed is 14 years old" "Wesley is 12 years old"
    #> [3] "Eli is 12 years old" "Toby is 1 years old"
    glue::glue("{child} is {age} years old")
    #> Reed is 14 years old
    #> Wesley is 12 years old
    #> Eli is 12 years old
    #> Toby is 1 years old
    Here's how I would do it.
    Conclusion: maybe someone already
    wrote that for loop for you!

    View full-size slide

  17. But what if you really do
    need to iterate?

    View full-size slide

  18. https://purrr.tidyverse.org
    Part of the tidyverse
    A "core" package in the tidyverse meta-package
    install.packages("tidyverse") # <-- install purrr + much more
    install.packages("purrr") # <-- installs only purrr
    library(tidyverse) # <-- loads purrr + much more
    library(purrr) # <-- loads only purrr

    View full-size slide

  19. purrr is an alternative to "apply" functions
    purrr::map() ≈ base::lapply()

    View full-size slide

  20. library(purrr)
    library(repurrrsive)
    help(package = "repurrrsive")

    View full-size slide

  21. Get comfortable with lists!
    atomic vectors are familar:
    logical, integer, double, character, etc
    a list = a generalized vector
    a list can hold almost anything

    View full-size slide

  22. "working with lists"

    View full-size slide

  23. How many elements are in got_chars? 

    Who is the 9th person listed in got_chars?
    What information is given for this person? 

    What is the difference between got_chars[9]
    and got_chars[[9]]? 

    Or ... do same for sw_people or the n-th person

    View full-size slide

  24. List exploration
    str(x, list.len = ?, max.level = ?)
    x[i]
    x[[i]]
    str(x[[i]], ...)
    View(x), in RStudio

    View full-size slide

  25. If list x is a train carrying objects:
    x[[5]] is the object in car 5
    x[4:6] is a train of cars 4-6.
    -- Tweet by @RLangTip

    View full-size slide

  26. from Subsetting chapter of 2nd ed Advanced R

    View full-size slide

  27. from Subsetting chapter of 2nd ed Advanced R

    View full-size slide

  28. x[[i]]
    x[i]
    x
    from
    http://r4ds.had.co.nz/vectors.html#lists-of-condiments

    View full-size slide

  29. map(.x, .f, ...)
    purrr::

    View full-size slide

  30. map(.x, .f, ...)
    purrr::
    for every element of .x
    do .f

    View full-size slide

  31. map(minis, antennate)

    View full-size slide

  32. from Functionals chapter of 2nd ed Advanced R

    View full-size slide

  33. map(.x, .f)
    purrr::
    .x <- SOME VECTOR OR LIST
    out <- vector(mode = "list", length = length(.x))
    for (i in seq_along(out)) {
    out[[i]] <- .f(.x[[i]])
    }
    out

    View full-size slide

  34. map(.x, .f)
    purrr::
    .x <- SOME VECTOR OR LIST
    out <- vector(mode = "list", length = length(.x))
    for (i in seq_along(out)) {
    out[[i]] <- .f(.x[[i]])
    }
    out
    purrr::map() is a nice way to
    write a for loop.

    View full-size slide

  35. How many aliases does each GoT
    character have?

    View full-size slide

  36. map(got_chars, .f = )
    map(sw_people, .f = )
    or

    View full-size slide

  37. Workflow:
    1. Do it for one element.
    2. Find the general recipe.
    3. Drop into map() to do for all.

    View full-size slide

  38. Step 1: Do it for one element
    daenerys <- got_chars[[9]]
    ## View(daenerys)
    daenerys[["aliases"]]
    #> [1] "Dany" "Daenerys Stormborn"
    #> [3] "The Unburnt" "Mother of Dragons"
    #> [5] "Mother" "Mhysa"
    #> [7] "The Silver Queen" "Silver Lady"
    #> [9] "Dragonmother" "The Dragon Queen"
    #> [11] "The Mad King's daughter"
    length(daenerys[["aliases"]])
    #> [1] 11

    View full-size slide

  39. Step 1: Do it for one element
    asha <- got_chars[[13]]
    ## View(asha)
    asha[["aliases"]]
    #> [1] "Esgred" "The Kraken's Daughter"
    length(asha[["aliases"]])
    #> [1] 2

    View full-size slide

  40. Step 2: Find the general recipe
    .x <- got_chars[[?]]
    length(.x[["aliases"]])

    View full-size slide

  41. Step 2: Find the general recipe
    .x <- got_chars[[?]]
    length(.x[["aliases"]])
    .x is a pronoun, like "it"
    means "the current element"

    View full-size slide

  42. Step 3: Drop into map() to do for all
    map(got_chars, ~ length(.x[["aliases"]]))
    #> [[1]]
    #> [1] 4
    #>
    #> [[2]]
    #> [1] 11
    #>
    #> [[3]]
    #> [1] 1
    #>
    #> [[4]]
    #> [1] 1
    #> ...

    View full-size slide

  43. Step 3: Drop into map() to do for all
    map(got_chars, ~ length(.x[["aliases"]]))
    #> [[1]]
    #> [1] 4
    #>
    #> [[2]]
    #> [1] 11
    #>
    #> [[3]]
    #> [1] 1
    #>
    #> [[4]]
    #> [1] 1
    #> ...
    formula method of specifying .f
    .x means "the current element"
    concise syntax for anonymous functions
    a.k.a. lambda functions

    View full-size slide

  44. Challenge (pick one or more!)
    How many x does each (GoT or SW) character
    have? (x = titles, allegiances, vehicles, starships)
    map(got_chars, ~ length(.x[["aliases"]]))

    View full-size slide

  45. map_int(got_chars, ~ length(.x[["aliases"]]))
    #> [1] 4 11 1 1 1 1 1 1 11 5 16
    #> [12] 1 2 5 3 3 3 5 0 3 4 1
    #> [25] 8 2 1 5 1 4 7 3
    Oh, would you prefer an integer vector?
    map()
    map_lgl()
    map_int()
    map_dbl()
    map_chr()
    type-specific
    variants of map()

    View full-size slide

  46. Challenge:
    Replace map() with type-specific map()
    # What's each character's name?
    map(got_chars, ~.x[["name"]])
    map(sw_people, ~.x[["name"]])
    # What color is each SW character's hair?
    map(sw_people, ~ .x[["hair_color"]])
    # Is the GoT character alive?
    map(got_chars, ~ .x[["alive"]])
    # Is the SW character female?
    map(sw_people, ~ .x[["gender"]] == "female")
    # How heavy is each SW character?
    map(sw_people, ~ .x[["mass"]])

    View full-size slide

  47. Lists can be awkward
    Lists are necessary
    Get to know your list

    View full-size slide

  48. map(.x, .f, ...)
    purrr::
    for every element of .x
    do .f

    View full-size slide

  49. map(.x, .f)
    purrr::
    map(got_chars, ~ length(.x[["aliases"]]))
    quick anonymous functions
    via formula

    View full-size slide

  50. map_lgl(sw_people, ~ .x[["gender"]] == "female")
    map_int(got_chars, ~ length(.x[["aliases"]]))
    map_chr(got_chars, ~ .x[["name"]])

    View full-size slide

  51. Notice:
    We extract by name a lot
    # What's each character's name?
    map(got_chars, ~.x[["name"]])
    # What color is each SW character's hair?
    map(sw_people, ~ .x[["hair_color"]])
    # Is the GoT character alive?
    map(got_chars, ~ .x[["alive"]])
    # How heavy is each SW character?
    map(sw_people, ~ .x[["mass"]])

    View full-size slide

  52. map_chr(got_chars, ~ .x[["name"]])
    map_chr(got_chars, "name")
    Shortcut!
    .f accepts a name or position

    View full-size slide

  53. map(minis, "pants")

    View full-size slide

  54. Challenge:
    Explore a GoT or SW list and find a new element to look at
    Extract it across the whole list with name and position
    shortcuts for .f
    Use map_TYPE() to get an atomic vector as output
    map_??(got_??, ??)
    map_??( sw_??, ??)

    View full-size slide

  55. Common problem
    I'm using map_TYPE() but some
    individual elements aren't of length 1.
    They are absent or have length > 1.

    View full-size slide

  56. Solutions
    Missing elements?
    Specify a .default value.
    Elements of length > 1?
    You can't make an atomic vector.*
    Get happy with a list or list-column.
    Or pick one element, e.g., the first.
    * You can, if you are willing to flatten() or squash().

    View full-size slide

  57. map(sw_vehicles, "pilots", .default = NA)
    #> [[1]]
    #> [1] NA
    #>
    #> ...
    #>
    #> [[19]]
    #> [1] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/32/"
    #>
    #> [[20]]
    #> [1] "http://swapi.co/api/people/44/"
    #>
    #> ...
    #>
    #> [[37]]
    #> [1] "http://swapi.co/api/people/67/"
    #>
    #> [[38]]
    #> [1] NA
    #>
    #> [[39]]
    #> [1] NA

    View full-size slide

  58. map_chr(sw_vehicles, list("pilots", 1), .default = NA)
    #> [1] NA NA
    #> [3] NA NA
    #> [5] "http://swapi.co/api/people/1/" NA
    #> [7] NA "http://swapi.co/api/people/13/"
    #> [9] NA NA
    #> [11] NA NA
    #> [13] "http://swapi.co/api/people/1/" NA
    #> [15] NA NA
    #> [17] NA NA
    #> [19] "http://swapi.co/api/people/10/" "http://swapi.co/api/people/44/"
    #> [21] "http://swapi.co/api/people/11/" "http://swapi.co/api/people/70/"
    #> [23] "http://swapi.co/api/people/11/" NA
    #> [25] NA "http://swapi.co/api/people/79/"
    #> [27] NA NA
    #> [29] NA NA
    #> [31] NA NA
    #> [33] NA NA
    #> [35] NA NA
    #> [37] "http://swapi.co/api/people/67/" NA
    #> [39] NA

    View full-size slide

  59. Shortcut!
    .f accepts a name or position vector of
    names or positions or a list of names
    and positions
    map(got_chars, c(14, 1))
    map(sw_vehicles, list("pilots", 1))

    View full-size slide

  60. Names make life nicer!
    map_chr(got_chars, "name")
    #> [1] "Theon Greyjoy" "Tyrion Lannister" "Victarion Greyjoy"
    #> ...
    got_chars_named <- set_names(got_chars, map_chr(got_chars, "name"))
    got_chars_named %>%
    map_lgl("alive")
    #> Theon Greyjoy Tyrion Lannister Victarion Greyjoy
    #> TRUE TRUE TRUE
    #> ...
    Names propagate in purrr pipelines.
    Set them early and enjoy!

    View full-size slide

  61. allegiances <- map(got_chars_named, "allegiances")
    tibble::enframe(allegiances, value = "allegiances")
    #> # A tibble: 30 x 2
    #> name allegiances
    #>
    #> 1 Theon Greyjoy
    #> 2 Tyrion Lannister
    #> 3 Victarion Greyjoy
    #> 4 Will
    #> 5 Areo Hotah
    #> 6 Chett
    #> 7 Cressen
    #> 8 Arianne Martell
    #> 9 Daenerys Targaryen
    #> 10 Davos Seaworth
    #> # ... with 20 more rows
    tibble::enframe() does this:
    named list → df w/ names & list-column

    View full-size slide

  62. Set list names for a happier life.
    There are many ways to specify .f.
    .default is useful for missing things.
    got_chars_named <- set_names(got_chars, map_chr(got_chars, "name"))
    map(got_chars, ~ length(.x[["aliases"]]))
    map_chr(got_chars, "name")
    map(sw_vehicles, list("pilots", 1))
    map(sw_vehicles, "pilots", .default = NA)
    map_chr(sw_vehicles, list("pilots", 1), .default = NA)

    View full-size slide

  63. Challenge:
    Create a named copy of a GoT or SW list with set_names().
    Find an element with tricky presence/absence or length.
    Extract it many ways:
    - by name
    - by position
    - by list("name", pos) or c(pos, pos)
    - use .default for missing data
    - use map_TYPE() to coerce output to atomic vector

    View full-size slide

  64. Challenge (pick one or more):
    Which SW film has the most characters?
    Which SW species has the most possible eye colors?
    Which GoT character has the most allegiances? Aliases?
    Titles?
    Which GoT character has been played by multiple actors?

    View full-size slide

  65. Inspiration for your
    future purrr work

    View full-size slide

  66. map(.x, .f, ...)
    books <- map(got_chars_named, "books")
    map_chr(books[1:2], paste, collapse = ", ")
    #> Theon Greyjoy
    #> "A Game of Thrones, A Storm of Swords, A Feast for Crows"
    #> Tyrion Lannister
    #> "A Feast for Crows, The World of Ice and Fire"
    map_chr(books[1:2], ~ paste(.x, collapse = ", "))
    #> Theon Greyjoy
    #> "A Game of Thrones, A Storm of Swords, A Feast for Crows"
    #> Tyrion Lannister
    #> "A Feast for Crows, The World of Ice and Fire"

    View full-size slide

  67. from Functionals chapter of 2nd ed Advanced R
    map(.x, .f, ...)

    View full-size slide

  68. map(.x, .f, ...)
    books <- map(got_chars_named, "books")
    map_chr(books[1:2], paste, collapse = ", ")
    #> Theon Greyjoy
    #> "A Game of Thrones, A Storm of Swords, A Feast for Crows"
    #> Tyrion Lannister
    #> "A Feast for Crows, The World of Ice and Fire"
    map_chr(books[1:2], ~ paste(.x, collapse = ", "))
    #> Theon Greyjoy
    #> "A Game of Thrones, A Storm of Swords, A Feast for Crows"
    #> Tyrion Lannister
    #> "A Feast for Crows, The World of Ice and Fire"

    View full-size slide

  69. So, yes,
    there are many ways to specify .f.
    map(got_chars, ~ length(.x[["aliases"]]))
    map_chr(got_chars, "name")
    map_chr(books[1:2], paste, collapse = ", ")
    map(sw_vehicles, list("pilots", 1))

    View full-size slide

  70. library(tidyverse)
    library(gapminder)
    countries <- c("Argentina", "Brazil", "Canada")
    gap_small <- gapminder %>%
    filter(country %in% countries, year > 1996)
    gap_small
    #> # A tibble: 9 x 6
    #> country continent year lifeExp pop gdpPercap
    #>
    #> 1 Argentina Americas 1997 73.3 36203463 10967.
    #> 2 Argentina Americas 2002 74.3 38331121 8798.
    #> 3 Argentina Americas 2007 75.3 40301927 12779.
    #> 4 Brazil Americas 1997 69.4 168546719 7958.
    #> 5 Brazil Americas 2002 71.0 179914212 8131.
    #> 6 Brazil Americas 2007 72.4 190010647 9066.
    #> 7 Canada Americas 1997 78.6 30305843 28955.
    #> 8 Canada Americas 2002 79.8 31902268 33329.
    #> 9 Canada Americas 2007 80.7 33390141 36319.
    write_one <- function(x) {
    filename <- paste0(x, ".csv")
    dataset <- filter(gap_small, country == x)
    write_csv(dataset, filename)
    }
    walk(countries, write_one)
    list.files(pattern = "*.csv")
    #> [1] "Argentina.csv" "Brazil.csv" "Canada.csv"
    walk() is map() but
    returns no output

    View full-size slide

  71. library(tidyverse)
    csv_files <- list.files(pattern = "*.csv")
    csv_files
    #> [1] "Argentina.csv" "Brazil.csv" "Canada.csv"
    map_dfr(csv_files, ~ read_csv(.x))
    #> # A tibble: 9 x 6
    #> country continent year lifeExp pop gdpPercap
    #>
    #> 1 Argentina Americas 1997 73.3 36203463 10967.
    #> 2 Argentina Americas 2002 74.3 38331121 8798.
    #> 3 Argentina Americas 2007 75.3 40301927 12779.
    #> 4 Brazil Americas 1997 69.4 168546719 7958.
    #> 5 Brazil Americas 2002 71.0 179914212 8131.
    #> 6 Brazil Americas 2007 72.4 190010647 9066.
    #> 7 Canada Americas 1997 78.6 30305843 28955.
    #> 8 Canada Americas 2002 79.8 31902268 33329.
    #> 9 Canada Americas 2007 80.7 33390141 36319.
    map_dfr() rowbinds a
    list of data frames

    View full-size slide

  72. mapping over 2 or
    more things in parallel

    View full-size slide

  73. .y = hair
    .x = minis

    View full-size slide

  74. map2(minis, hair, enhair)

    View full-size slide

  75. .y = weapons
    .x = minis

    View full-size slide

  76. map2(minis, weapons, arm)

    View full-size slide

  77. minis %>%
    map2(hair, enhair) %>%
    map2(weapons, arm)

    View full-size slide

  78. from Functionals chapter of 2nd ed Advanced R

    View full-size slide

  79. df <- tibble(pants, torso, head)
    embody <- function(pants, torso, head)
    insert(insert(pants, torso), head)

    View full-size slide

  80. pmap(df, embody)

    View full-size slide

  81. from Functionals chapter of 2nd ed Advanced R

    View full-size slide

  82. map_dfr(minis, `[`,
    c("pants", "torso", "head")

    View full-size slide

  83. For much more on this:
    rstd.io/row-work

    View full-size slide

  84. from Functionals chapter of 2nd ed Advanced R
    You have the basis for exploring
    the world of purrr now!

    View full-size slide