Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Row-oriented workflows in R with the tidyverse

Row-oriented workflows in R with the tidyverse

Slides for RStudio webinar
Jenny Bryan
Code and more resources at:
https://rstd.io/row-work

Jennifer (Jenny) Bryan

April 11, 2018
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1. Jennifer Bryan 

    RStudio, University of British Columbia
     @JennyBryan  @jennybc
    Row-oriented
    workflows in +

    View full-size slide

  2. rstd.io/row-work
    GitHub repo has all code.
    Link to slides on SpeakerDeck.
    Get the .R files to play along.
    Or follow via rendered .md.

    View full-size slide

  3. This work is licensed under a
    Creative Commons
    Attribution-ShareAlike 4.0
    International License.
    To view a copy of this license, visit 

    http://creativecommons.org/licenses/by-sa/4.0/

    View full-size slide

  4. download materials: rstd.io/row-work
    I assume you know or want to know:
    the tidyverse packages
    the pipe operator, %>%
    list = core data structure
    "apply" or "map" functions,
    e.g. base::lapply() and purrr::map()

    View full-size slide

  5. download materials: rstd.io/row-work
    tidyverse.org

    View full-size slide

  6. download materials: rstd.io/row-work
    r4ds.had.co.nz

    View full-size slide

  7. download materials: rstd.io/row-work
    https://twitter.com/daattali/status/761058049859518464
    https://twitter.com/daattali/status/761233607822221312

    View full-size slide

  8. download materials: rstd.io/row-work
    > str(i_want)
    List of 2
    $ :List of 2
    ..$ x: num 1
    ..$ y: chr "one"
    $ :List of 2
    ..$ x: num 2
    ..$ y: chr "two"
    > i_have
    # A tibble: 2 x 2
    x y

    1 1. one
    2 2. two
    How to do this?

    View full-size slide

  9. download materials: rstd.io/row-work
    https://rpubs.com/wch/200398
    Winston compiled,
    I updated.

    View full-size slide

  10. download materials: rstd.io/row-work
    df <- SOME DATA FRAME
    out <- vector(mode = "list", length = nrow(df))
    for (i in seq_along(out)) {
    out[[i]] <- as.list(df[i, , drop = FALSE])
    }
    out
    for loop

    View full-size slide

  11. download materials: rstd.io/row-work
    df <- SOME DATA FRAME
    df <- split(df, seq_len(nrow(df)))
    lapply(df, function(row) as.list(row))
    split by row then lapply
    df <- SOME DATA FRAME
    lapply(
    seq_len(nrow(df)),
    function(i) as.list(df[i, , drop = FALSE])
    )
    lapply over row numbers

    View full-size slide

  12. download materials: rstd.io/row-work
    df <- SOME DATA FRAME
    transpose(df)
    df <- SOME DATA FRAME
    pmap(df, list)
    purrr::pmap()
    purrr::transpose()*
    * Happens to be exactly what's needed in this specific example.

    View full-size slide

  13. download materials: rstd.io/row-work
    Why so many ways to do
    THING for each row?
    Because there is no way.

    View full-size slide

  14. download materials: rstd.io/row-work
    Why so many ways to do
    THING for each row?
    Columns are very special in R.
    This is fantastic for data analysis.
    Tradeoff: row-oriented work is harder.

    View full-size slide

  15. download materials: rstd.io/row-work
    How to choose?
    Speed and ease of:
    • Writing the code
    • Reading the code
    • Executing the code

    View full-size slide

  16. download materials: rstd.io/row-work
    Of course someone has
    to write loops
    It doesn't have to be you

    View full-size slide

  17. download materials: rstd.io/row-work
    Pro tip #1
    Use vectorized functions.
    Let other people write loop-y
    code for you.

    View full-size slide

  18. download materials: rstd.io/row-work
    paste() example
    ex03_row-wise-iteration-are-you-sure.R

    View full-size slide

  19. download materials: rstd.io/row-work
    Pro tip #2
    Use purrr::map()* and friends.
    Let other people write loop-y
    code for you.
    * Like base::lapply(), but anchors a large, coherent family of map functions.

    View full-size slide

  20. download materials: rstd.io/row-work
    map(.x, .f, ...)
    purrr::

    View full-size slide

  21. download materials: rstd.io/row-work
    map(.x, .f, ...)
    for every element of .x
    apply .f

    View full-size slide

  22. map(minis, antennate)

    View full-size slide

  23. download materials: rstd.io/row-work
    map(.x, .f, ...)
    .x <- SOME VECTOR OR LIST
    out <- vector(mode = "list", length = length(.x))
    for (i in seq_along(out)) {
    out[[i]] <- .f(.x[[i]])
    }
    out

    View full-size slide

  24. download materials: rstd.io/row-work
    map(.x, .f, ...)
    purrr::map() implements a for loop!
    But with less code clutter.

    View full-size slide

  25. download materials: rstd.io/row-work
    purrr::map() example
    ex04_map-example.R

    View full-size slide

  26. download materials: rstd.io/row-work
    No, I really do
    need to do THING
    for each row.

    View full-size slide

  27. download materials: rstd.io/row-work
    > str(i_want)
    List of 2
    $ :List of 2
    ..$ x: num 1
    ..$ y: chr "one"
    $ :List of 2
    ..$ x: num 2
    ..$ y: chr "two"
    > i_have
    # A tibble: 2 x 2
    x y

    1 1. one
    2 2. two
    How to do this?

    View full-size slide

  28. download materials: rstd.io/row-work
    pmap(.l, .f, ...)
    for every tuple in.l
    apply .f

    View full-size slide

  29. pmap(.l, embody)

    View full-size slide

  30. pmap(.l, embody)

    View full-size slide

  31. download materials: rstd.io/row-work
    pmap(.l, .f, ...)
    .l <- LIST OF LENGTH-N VECTORS
    out <- vector(mode = "list", length = N)
    for (i in seq_along(out)) {
    out[[i]] <- .f(.l[[1]][[i]], .l[[2]][[i]], ...)
    }
    out

    View full-size slide

  32. download materials: rstd.io/row-work
    pmap(.l, .f, ...)
    .l <- LIST OF LENGTH-N VECTORS
    out <- vector(mode = "list", length = N)
    for (i in seq_along(out)) {
    out[[i]] <- .f(.l[[1]][[i]], .l[[2]][[i]], ...)
    }
    out
    A data frame works!
    row i

    View full-size slide

  33. download materials: rstd.io/row-work
    pmap(.l, .f, ...)
    .l <- LIST OF LENGTH-N VECTORS
    out <- vector(mode = "list", length = N)
    for (i in seq_along(out)) {
    out[[i]] <- .f(.l[[1]][[i]], .l[[2]][[i]], ...)
    }
    out
    pmap() is a for loop!
    it applies .f to each row

    View full-size slide

  34. download materials: rstd.io/row-work
    purrr::pmap() example
    ex06_runif-via-pmap.R

    View full-size slide

  35. download materials: rstd.io/row-work
    How to choose?
    Speed and ease of:
    • Writing the code
    • Reading the code
    • Executing the code

    View full-size slide

  36. download materials: rstd.io/row-work
    map()
    map_lgl(), map_int(), map_dbl(), map_chr()
    map_if(), map_at()
    map_dfr(), map_dfc()
    map2()
    map2_lgl(), map2_int(), map2_dbl(), map2_chr()
    map2_dfr(), map2_dfc()
    pmap()
    pmap_lgl(), pmap_int(), pmap_dbl(), pmap_chr()
    pmap_dfr(), pmap_dfc()
    imap()
    imap_lgl(), imap_chr(), imap_int(), imap_dbl()
    imap_dfr(), imap_dfc()

    View full-size slide

  37. download materials: rstd.io/row-work
    map()
    map_lgl(), map_int(), map_dbl(), map_chr()
    map_if(), map_at()
    map_dfr(), map_dfc()
    map2()
    map2_lgl(), map2_int(), map2_dbl(), map2_chr()
    map2_dfr(), map2_dfc()
    pmap()
    pmap_lgl(), pmap_int(), pmap_dbl(), pmap_chr()
    pmap_dfr(), pmap_dfc()
    imap()
    imap_lgl(), imap_chr(), imap_int(), imap_dbl()
    imap_dfr(), imap_dfc()
    purrr's map functions have
    a common interface


    learn it once,
    use it everywhere

    View full-size slide

  38. download materials: rstd.io/row-work
    df <- SOME DATA FRAME
    out <- vector(mode = "list", length = nrow(df))
    for (i in seq_along(out)) {
    out[[i]] <- as.list(df[i, , drop = FALSE])
    }
    out
    for loop
    df <- SOME DATA FRAME
    df <- split(df, seq_len(nrow(df)))
    lapply(df, function(row) as.list(row))
    split by row then lapply
    df <- SOME DATA FRAME
    lapply(
    seq_len(nrow(df)),
    function(i) as.list(df[i, , drop = FALSE])
    )
    lapply over row numbers
    df <- SOME DATA FRAME
    pmap(df, list)
    purrr::pmap()
    df <- SOME DATA FRAME
    transpose(df)
    purrr::transpose()

    View full-size slide

  39. download materials: rstd.io/row-work

    View full-size slide

  40. download materials: rstd.io/row-work
    code for that study:
    iterate-over-rows.R

    View full-size slide

  41. download materials: rstd.io/row-work
    purrr::pmap(df, .f)
    for each row of df
    do this

    View full-size slide

  42. download materials: rstd.io/row-work
    What if I need to work
    on groups of rows?

    View full-size slide

  43. download materials: rstd.io/row-work
    Pro tip #3
    Use dplyr::group_by() +
    summarize().
    Let other people write loop-y
    code for you.

    View full-size slide

  44. download materials: rstd.io/row-work
    group_by() + summarize() example
    ex07_group-by-summarise.R

    View full-size slide

  45. download materials: rstd.io/row-work
    No, I really must work
    on groups of rows.

    View full-size slide

  46. download materials: rstd.io/row-work
    Use nesting
    to restate as
    "do THING for each row"

    View full-size slide

  47. download materials: rstd.io/row-work
    Use nesting
    to restate as
    "do THING for each row"
    DONE
    * See everything up 'til now in this talk.
    *

    View full-size slide

  48. download materials: rstd.io/row-work
    dplyr::group_by() + tidyr::nest()
    ex08_nesting-is-good.R

    View full-size slide

  49. download materials: rstd.io/row-work
    embrace the data frame
    esp. the tibble = tidyverse data frame
    embrace lists
    embrace lists as variables in a tibble
    "list-columns", may come from nesting
    embrace purrr::map() & friends
    Tips for row-oriented workflows

    View full-size slide