Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TokyoR#104_DataProcessing

 TokyoR#104_DataProcessing

第104回Tokyo.Rでしゃべった際の資料です。

kilometer

March 04, 2023
Tweet

More Decks by kilometer

Other Decks in Programming

Transcript

  1. #104
    @kilometer00
    2023.03.04
    BeginneR Session
    Data processing &
    visualization

    View Slide

  2. Who!?
    Who?

    View Slide

  3. Who!?
    名前: 三村 @kilometer
    職業: ポスドク (こうがくはくし)
    専⾨: ⾏動神経科学(霊⻑類)
    脳イメージング
    医療システム⼯学
    R歴: ~ 10年ぐらい
    流⾏: 椅⼦を新調

    View Slide

  4. 宣伝!!(書籍の翻訳に参加しました。)
    絶賛販売中!

    View Slide

  5. BeginneR Session

    View Slide

  6. BeginneR

    View Slide

  7. Before A'er
    BeginneR Session
    BeginneR BeginneR

    View Slide

  8. BeginneR Advanced Hoxo_m
    If I have seen further it is by standing on the
    shoulders of Giants.
    -- Sir Isaac Newton, 1676

    View Slide

  9. #104
    @kilometer00
    2023.03.03
    BeginneR Session
    Data processing &
    visualization

    View Slide

  10. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View Slide

  11. σʔλ
    情報のうち意思伝達・解釈・処理に
    適した再利⽤可能なもの
    国際電気標準会議([email protected] Electrotechnical Commission, IEC)による定義

    View Slide

  12. σʔλ
    情報のうち意思伝達・解釈・処理に
    適した再利⽤可能なもの
    ৘ใ 実存を符号化した表象

    View Slide

  13. σʔλ
    ৘ใͷ͏ͪҙࢥ఻ୡɾղऍɾॲཧʹ
    దͨ͠࠶ར༻Մೳͳ΋ͷ
    ৘ใ ࣮ଘΛූ߸Խͨ͠ද৅
    ࣮ଘ
    ؍࡯ͷ༗ແʹΑΒͣଘࡏ͍ͯ͠Δ
    ΋ͷͦͷ΋ͷ
    ࣸ૾ʢූ߸Խʣ

    View Slide

  14. 集合! 集合"
    要素# 要素$
    写像 %: ! → "もしくは%: # ⟼ $
    (始集合・定義域) (終集合・終域)
    【写像】
    ある集合の要素を他の集合のただ1つの要素に
    対応づける規則

    View Slide

  15. 地図空間
    ⽣物種名空間
    名空間
    ⾦銭価値空間
    (円)
    ⾦銭価値空間
    (ドル)
    コーヒー
    ¥290
    $2.53
    [緯度, 経度]
    Homo sapiens
    実存
    写像
    写像
    写像
    写像
    写像
    写像
    情報
    【写像】
    ある集合の要素を他の集合のただ1つの要素に対応づける規則

    View Slide

  16. ࣸ૾
    Ϧϯΰ
    ʢ࣮ଘʣ
    Ϧϯΰ
    ʢ৘ใʣ
    mapping

    View Slide

  17. ৘ใྔ
    ࣮ଘ
    ৘ใ
    σʔλ Ϧϯΰ
    ූ߸Խ

    View Slide

  18. ৘ใྔ
    ࣮ଘ
    ৘ใ
    σʔλ Ϧϯΰ
    ූ߸Խ
    ৘ใྔͷଛࣦ

    View Slide

  19. Experiment
    hypothesis observa=on
    principle phenotype
    model data
    Truth
    Knowledge f X
    (unknown)

    View Slide

  20. raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Plot
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Image Files
    ggsave()
    Data Processing

    View Slide

  21. data.frame
    *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Plot
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Image Files
    ggsave()
    Data Processing

    View Slide

  22. data.frame

    View Slide

  23. vector
    in Excel

    View Slide

  24. vector
    in R
    in Excel
    pre <- c(1, 2, 3, 4, 5)
    post <- pre * 5
    > pre
    [1] 1 2 3 4 5
    > post
    [1] 5 10 15 20 25

    View Slide

  25. vector
    vec1 <- c(1, 2, 3, 4, 5)
    vec2 <- 1:5
    vec3 <- seq(from = 1, to = 5, by = 1)
    > vec1
    [1] 1 2 3 4 5
    > vec2
    [1] 1 2 3 4 5
    > vec3
    [1] 1 2 3 4 5

    View Slide

  26. vector
    vec1 <- seq(from = 1, to = 5, by = 1)
    vec2 <- seq(1, 5, 1)
    > vec1
    [1] 1 2 3 4 5
    > vec2
    [1] 1 2 3 4 5

    View Slide

  27. > ?seq
    vector
    seq{base}
    Sequence Generation
    Description
    Generate regular sequences. seq is a standard
    generic with a default method. …
    Usage
    seq(...)
    ## Default S3 method:
    seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
    length.out = NULL, along.with = NULL, ...)

    View Slide

  28. vector
    vec1 <- rep(1:3, times = 2)
    vec2 <- rep(1:3, each = 2)
    vec3 <- rep(1:3, times = 2, each = 2)
    > vec1
    [1] 1 2 3 1 2 3
    > vec2
    [1] 1 1 2 2 3 3
    > vec3
    [1] 1 1 2 2 3 3 1 1 2 2 3 3

    View Slide

  29. vector
    vec1 <- 11:15
    > vec1
    [1] 11 12 13 14 15
    > vec1[1]
    [1] 11
    > vec1[3:5]
    [1] 13 14 15
    > vec1[c(1:2, 5)]
    [1] 11 12 15

    View Slide

  30. list
    list1 <- list(1:6, 11:15, c("a", "b", "c"))
    > list1
    [[1]]
    [1] 1 2 3 4 5 6
    [[2]]
    [1] 11 12 13 14 15
    [[3]]
    [1] "a" "b" "c"

    View Slide

  31. list
    list1 <- list(1:6, 11:15, c("a", "b", "c"))
    > list1[[1]]
    [1] 1 2 3 4 5 6
    > list1[[3]][2:3]
    [1] "b" "c"
    > list1[[2]] * 3
    [1] 33 36 39 42 45

    View Slide

  32. named list
    list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c"))
    > list2
    $A
    [1] 1 2 3 4 5 6
    $B
    [1] 11 12 13 14 15
    $C
    [1] "a" "b" "c"

    View Slide

  33. > list2$A
    [1] 1 2 3 4 5 6
    > list2$C[2:3]
    [1] "b" "c"
    > list2$B * 3
    [1] 33 36 39 42 45
    named list
    list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c"))

    View Slide

  34. list1 <- list(1:6, 11:15, c("a", "b", "c"))
    > class(list1)
    [1] "list"
    > names(list1)
    NULL
    list2 <- list(A = 1:6, B = 11:15, C = c("a", "b", "c"))
    > class(list2)
    [1] "list"
    > names(list2)
    [1] "A" "B" "C"
    named list
    list

    View Slide

  35. list3 <- list(A = 1:3, B = 11:13)
    > class(list3)
    [1] "list"
    > names(list3)
    [1] "A" "B"
    df1 <- data.frame(A = 1:3, B = 11:13)
    > class(df1)
    [1] "data.frame"
    > names(df1)
    [1] "A" "B"
    named list & data.frame

    View Slide

  36. > str(list3)
    List of 2
    $ A: int [1:3] 1 2 3
    $ B: int [1:3] 11 12 13
    > str(df1)
    'data.frame': 3 obs. of 2 variables:
    $ A: int 1 2 3
    $ B: int 11 12 13
    list3 <- list(A = 1:3, B = 11:13)
    df1 <- data.frame(A = 1:3, B = 11:13)
    named list & data.frame

    View Slide

  37. > list3
    $A
    [1] 1 2 3
    $B
    [1] 11 12 13
    > df1
    A B
    1 1 11
    2 2 12
    3 3 13
    named list & data.frame

    View Slide

  38. data.frame vs. matrix
    A B
    1 1 11
    2 2 12
    3 3 13
    [,1] [,2]
    [1,] 1 11
    [2,] 2 12
    [3,] 3 13
    df1 <- data.frame(A = 1:3, B = 11:13)
    > str(mat1)
    int [1:3, 1:2] 1 2 3 11 12 13
    > str(df1)
    'data.frame': 3 obs. of 2 vars.:
    $ A: int 1 2 3
    $ B: int 11 12 13
    mat1 <- matrix(c(1:3, 11:13), nrow = 3, ncol = 2)

    View Slide

  39. data.frame
    variables
    observa*on

    View Slide

  40. data.frame
    *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    Nested form
    pivot_wider()
    Plot
    group_nest() unnest()
    {ggplot2}
    {patchwork}
    Image Files
    ggsave()
    Data Processing

    View Slide

  41. > anscombe
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    7 6 6 6 8 7.24 6.13 6.08 5.25
    8 4 4 4 19 4.26 3.10 5.39 12.50
    9 12 12 12 8 10.84 9.13 8.15 5.56
    10 7 7 7 8 4.82 7.26 6.42 7.91
    11 5 5 5 8 5.68 4.74 5.73 6.89
    Wide form data

    View Slide

  42. > df
    tag x1 x2 x3 x4 y1 y2 y3 y4
    1 1 10 10 10 8 8.04 9.14 7.46 6.58
    2 2 8 8 8 8 6.95 8.14 6.77 5.76
    3 3 13 13 13 8 7.58 8.74 12.74 7.71
    4 4 9 9 9 8 8.81 8.77 7.11 8.84
    5 5 11 11 11 8 8.33 9.26 7.81 8.47
    6 6 14 14 14 8 9.96 8.10 8.84 7.04
    Wide form data
    df <-
    rownames_to_column(
    anscombe,
    var = "tag"
    )

    View Slide

  43. Wide form → Long form data
    df_long_1 <-
    pivot_longer(
    data = df,
    cols = !tag
    )
    df_long_2 <-
    pivot_longer(
    data = df,
    cols = !tag,
    names_to = c(".value", "key"),
    names_pattern = c("(.)(.)")
    )

    View Slide

  44. Long form → Wide form data
    pivot_wider(
    data = df_long_1,
    values_from = value,
    names_from = name
    )
    pivot_wider(
    data = df_long_2,
    values_from = c(x, y),
    names_from = tag
    )

    View Slide

  45. data.frame / *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing

    View Slide

  46. raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    data.frame / *bble

    View Slide

  47. vignette("dplyr")

    View Slide

  48. It (dplyr) provides simple “verbs” to help
    you translate your thoughts into code.
    func?ons that correspond to the most
    common data manipula?on tasks
    Introduc6on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}

    View Slide

  49. dplyrは、あなたの考えをコードに翻訳
    するための【動詞】を提供する。
    データ操作における基本のキを、
    シンプルに実⾏できる関数 (群)
    Introduc6on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}
    ※ かなり意訳

    View Slide

  50. (SBNNBSPGEBUBNBOJQVMBUJPO
    By constraining your [email protected],
    it helps you think about your data
    [email protected] challenges.
    Introduc6on to dplyr
    hFps://cran.r-project.org/web/packages/dplyr/vigneFes/dplyr.html

    View Slide

  51. 選択肢を制限することで、
    データ解析のステップを
    シンプルに考えられますヨ。
    (めっちゃ意訳)
    Introduc6on to dplyr
    hFps://cran.r-project.org/web/packages/dplyr/vigneFes/dplyr.html
    ※ まさに意訳
    (SBNNBSPGEBUBNBOJQVMBUJPO

    View Slide

  52. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame [email protected]

    View Slide

  53. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame [email protected]
    0. %>%

    View Slide

  54. 1JQFBMHFCSB
    X %>% f
    X %>% f(y)
    X %>% f %>% g
    X %>% f(y, .)
    f(X)
    f(X, y)
    g(f(X))
    f(y, X)
    %>% {magri7r}
    「dplyr再⼊⾨(基本編)」[email protected]
    h"ps://speakerdeck.com/yutannihila6on/dplyrzai-ru-men-ji-ben-bian

    View Slide





  55. lift
    take
    pour
    put
    Bring milk from the kitchen!

    View Slide


  56. lift
    Bring milk from the kitchen!
    lift(Robot, glass, table) -> Robot'
    take

    take(Robot', fridge, milk) -> Robot''

    View Slide

  57. Bring milk from the kitchen!
    Robot' <- lift(Robot, glass, table)
    Robot'' <- take(Robot', fridge, milk)
    Robot''' <- pour(Robot'', milk, glass)
    result <- put(Robot''', glass, table)
    result <- Robot %>%
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④

    View Slide

  58. The =dyverse style guides
    h"ps://style.;dyverse.org/syntax.html#object-names
    "There are only two hard things in Computer Science:
    cache invalida:on and naming things"

    View Slide

  59. Bring milk from the kitchen!
    Robot' <- lift(Robot, glass, table)
    Robot'' <- take(Robot', fridge, milk)
    Robot''' <- pour(Robot'', milk, glass)
    result <- put(Robot''', glass, table)
    result <- Robot %>%
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④

    View Slide

  60. Robot' <- lift(Robot, glass, table)
    Robot'' <- take(Robot', fridge, milk)
    Robot''' <- pour(Robot'', milk, glass)
    result <- put(Robot''', glass, table)
    result <- Robot %>%
    lift(glass, table) %>%
    take(fridge, milk) %>%
    pour(milk, glass) %>%
    put(glass, table)
    by using pipe,
    # ①
    # ②
    # ③
    # ④
    # ①
    # ②
    # ③
    # ④
    Thinking Reading
    Bring milk from the kitchen!

    View Slide

  61. Programing
    Write
    Run
    Read
    Think
    Write
    Run
    Read
    Think
    Communicate
    Share

    View Slide

  62. 1JQFBMHFCSB
    X %>% f
    X %>% f(y)
    X %>% f %>% g
    X %>% f(y, .)
    f(X)
    f(X, y)
    g(f(X))
    f(y, X)
    %>% {magri7r}
    「dplyr再⼊⾨(基本編)」[email protected]
    h"ps://speakerdeck.com/yutannihila6on/dplyrzai-ru-men-ji-ben-bian

    View Slide

  63. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame [email protected]
    0. %>%

    View Slide

  64. WFSCT {dplyr}
    mutate # カラムの追加
    +
    mutate(dat, C = fun(A, B))

    View Slide

  65. WFSCT {dplyr}
    mutate # カラムの追加
    +
    dat %>% mutate(C = fun(A, B))

    View Slide

  66. WFSCT {dplyr}
    filter # 行の絞り込み
    dat %>% filter(tag %in% c(1, 3, 5))

    View Slide

  67. ブール演算⼦ Boolean Algebra
    A == B A != B
    George Boole
    1815 - 1864
    A | B A & B
    A %in% B
    # equal to # not equal to
    # or # and
    # is A in B?
    wikipedia

    View Slide

  68. "a" != "b"
    # is A in B?
    ブール演算⼦ Boolean Algebra
    [1] TRUE
    1 %in% 10:100
    # is A in B?
    [1] FALSE

    View Slide

  69. George Boole
    1815 - 1864
    A Class-Room Introduc;on to Logic
    h"ps://niyamaklogic.wordpress.com/c
    ategory/laws-of-thoughts/
    Mathema=cian
    Philosopher
    &

    View Slide

  70. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select(tag, B)

    View Slide

  71. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select("tag", "B")

    View Slide

  72. WFSCT {dplyr}
    select # カラムの選択
    dat %>% select("tag", "B")
    dat %>% select(tag, B)

    View Slide

  73. WFSCT {dplyr}
    # Select help func?ons
    starts_with("s") ends_with("s")
    contains("se") matches("^.e")
    one_of(c(”tag", ”B"))
    everything()
    hFps://kazutan.github.io/blog/2017/04/dplyr-select-memo/
    「dplyr::selectの活⽤例メモ」kazutan

    View Slide

  74. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame [email protected]
    0. %>%




    View Slide

  75. 選択肢を制限することで、
    データ解析のステップを
    シンプルに考えられますヨ。
    (めっちゃ意訳)
    Introduc6on to dplyr
    hFps://cran.r-project.org/web/packages/dplyr/vigneFes/dplyr.html
    ※ まさに意訳
    (SBNNBSPGEBUBNBOJQVMBUJPO

    View Slide

  76. より多くの制約を課す事で、
    魂の⾜枷から、より⾃由になる。
    Igor Stravinsky
    И@горь Ф Страви́нский
    The more constraints one imposes,
    the more one frees one's self of the
    chains that shackle the spirit.
    1882 - 1971
    ※ 割と意訳

    View Slide

  77. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View Slide

  78. Programing
    Write
    Run
    Read
    Think
    Write
    Run
    Read
    Think
    Communicate
    Share

    View Slide

  79. Text Image
    Information
    Intention
    Data
    decode
    encode
    Data analysis
    feedback

    View Slide

  80. Text
    Image
    First, A. Next, B.
    Then C. Finally D.
    ?me
    Intention
    encode
    "Frozen" structure
    A B C D Xme
    value
    α
    β

    View Slide

  81. ࣸ૾
    Ϧϯΰ
    ʢ࣮ଘʣ
    Ϧϯΰ
    ʢ৘ใʣ
    mapping

    View Slide

  82. Ϧϯΰ
    ࣸ૾
    ϑϧʔπ
    ੺৭

    ը૾

    ࣮ଘ ৘ใ
    νϟωϧ
    mapping
    channel

    View Slide

  83. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    σʔλՄࢹԽ
    ࣸ૾
    mapping

    View Slide

  84. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    σʔλՄࢹԽ
    ࣸ૾
    mapping
    x axis, y axis, color, fill,
    shape, linetype, alpha…
    aesthetic channels
    ৹ඒతνϟωϧ

    View Slide

  85. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    σʔλՄࢹԽ
    ࣸ૾
    mapping
    x axis, y axis, color, fill,
    shape, linetype, alpha…
    aesthetic channels
    ৹ඒతνϟωϧ
    ggplot(data = my_data) +
    aes(x = X, y = Y)) +
    goem_point()
    HHQMPUʹΑΔ࡞ਤ

    View Slide

  86. ࣮ଘ
    ࣸ૾ʢ؍࡯ʣ
    σʔλ
    ࣸ૾ʢσʔλՄࢹԽʣ
    άϥϑ
    !
    "
    #!
    $!
    #"
    $"
    # $
    &!
    &"
    %!
    %"
    EBUB
    mapping
    aesthetic channels
    ৹ඒతνϟωϧ
    σʔλՄࢹԽ

    View Slide

  87. ॳΊͯͷHHQMPU
    library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y))

    View Slide

  88. ॳΊͯͷHHQMPU

    View Slide

  89. ॳΊͯͷHHQMPU
    library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y))
    EBUBGSBNFͷࢦఆ
    BFT
    ؔ਺ͷதͰ৹ඒతཁૉͱͯ͠ม਺ͱνϟωϧͷରԠΛࢦఆ
    ඳը։࢝Λએݴ ه߸Ͱͭͳ͙
    BFT
    ؔ਺ͷҾ਺໊
    EBUͷม਺໊
    άϥϑͷछྨʹ߹Θͤͨ[email protected]
    ؔ਺Λ࢖༻

    View Slide

  90. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y)) +
    geom_path(data = dat,
    mapping = aes(x = X, y = Y))
    ॳΊ͔ͯΒ൪໨ͷHHQMPU

    View Slide

  91. ॳΊ͔ͯΒ൪໨ͷHHQMPU

    View Slide

  92. HHQMPUίʔυͷॻ͖ํͷ৭ʑ
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y)) +
    geom_path(data = dat,
    mapping = aes(x = X, y = Y))
    ggplot(data = dat,
    mapping = aes(x = X, y = Y)) +
    geom_point() +
    geom_path()
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_point() +
    geom_path()
    ڞ௨ͷࢦఆΛHHQMPU
    ؔ਺ͷதͰߦ͍ɺҎԼলུ͢Δ͜ͱ͕Մೳ
    NBQQJOHͷ৘ใ͕ॻ͔ΕͨBFT
    ؔ਺ΛHHQMPU
    ؔ਺ͷ֎ʹஔ͘͜ͱ΋Ͱ͖Δ

    View Slide

  93. HHQMPUίʔυͷॻ͖ํͷ৭ʑ
    ggplot() +
    geom_point(data = dat,
    mapping = aes(x = X, y = Y, color = tag)) +
    geom_path(data = dat,
    mapping = aes(x = X, y = Y))
    ggplot(data = dat) +
    aes(x = X, y = Y) + # 括り出すのは共通するものだけ
    geom_point(mapping = aes(color = tag)) +
    geom_path()
    ϙΠϯτͷ৭ͷNBQQJOHΛࢦఆ

    View Slide

  94. HHQMPUίʔυͷॻ͖ํͷ৭ʑ
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_point(aes(color = tag)) +
    geom_path()
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(aes(color = tag))
    ͋ͱ͔ΒͰॏͶͨཁૉ͕લ໘ʹඳը͞ΕΔ

    View Slide

  95. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    g <-
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(mapping = aes(color = tag))
    HHQMPUը૾ͷอଘ
    ggsave(filename = "fig/demo01.png",
    plot = g,
    width = 4, height = 3, dpi = 150)

    View Slide

  96. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    g <-
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(mapping = aes(color = tag))
    HHQMPUը૾ͷอଘ
    ggsave(filename = "fig/demo01.png",
    plot = g,
    width = 4, height = 3, dpi = 150)
    αΠζ͸σϑΥϧτͰ͸Πϯν୯ҐͰࢦఆ

    View Slide

  97. library(tidyverse)
    dat <-
    data.frame(tag = rep(c("a", "b"), each = 2),
    X = c(1, 3, 5, 7),
    Y = c(3, 9, 4, 2))
    g <-
    ggplot(data = dat) +
    aes(x = X, y = Y) +
    geom_path() +
    geom_point(mapping = aes(color = tag))
    HHQMPUը૾ͷอଘ
    ggsave(filename = "fig/demo01.png",
    plot = g,
    width = 10, height = 7.5, dpi = 150,
    units = "cm") # "cm", "mm", "in"を指定可能

    View Slide

  98. [email protected]
    ؔ਺܈ DGIUUQTXXXSTUVEJPDPNSFTPVSDFTDIFBUTIFFUT

    View Slide

  99. ෳ਺ͷܥྻΛඳը͢Δ
    > head(anscombe)
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    ggplot(data = anscombe) +
    geom_point(aes(x = x1, y = y1)) +
    geom_point(aes(x = x2, y = y2), color = "Red") +
    geom_point(aes(x = x3, y = y3), color = "Blue") +
    geom_point(aes(x = x4, y = y4), color = "Green")
    ͜Ε·Ͱͷ஌ࣝͰؤுΔͱ͜͏ͳΔ

    View Slide

  100. HHQMPUʹΑΔσʔλՄࢹԽ
    ࣮ଘ
    ࣸ૾ʢ؍࡯ʣ
    σʔλ
    ࣸ૾ʢσʔλՄࢹԽʣ
    άϥϑ
    !
    "
    #!
    $!
    #"
    $"
    SBXEBUB
    写像
    aesthetic channels
    ৹ඒతνϟωϧ
    ՄࢹԽʹదͨ͠EBUBܗࣜ
    変形
    ਤͷͭͷ৹ඒతνϟωϧ͕
    σʔλͷͭͷม਺ʹରԠ͍ͯ͠Δ

    View Slide

  101. > head(anscombe)
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    > head(anscombe_long)
    key x y
    1 1 10 8.04
    2 2 10 9.14
    3 3 10 7.46
    4 4 8 6.58
    5 1 8 6.95
    6 2 8 8.14
    ggplot(data = anscombe_long) +
    aes(x = x, y = y, color = key) +
    geom_point()
    ৹ඒతνϟωϧ Y࣠ Z࣠ ৭
    ʹରԠ͢Δม਺ʹͳΔΑ͏มܗ
    ݟ௨͠ྑ͘γϯϓϧʹՄࢹԽͰ͖Δ

    View Slide

  102. > head(anscombe)
    x1 x2 x3 x4 y1 y2 y3 y4
    1 10 10 10 8 8.04 9.14 7.46 6.58
    2 8 8 8 8 6.95 8.14 6.77 5.76
    3 13 13 13 8 7.58 8.74 12.74 7.71
    4 9 9 9 8 8.81 8.77 7.11 8.84
    5 11 11 11 8 8.33 9.26 7.81 8.47
    6 14 14 14 8 9.96 8.10 8.84 7.04
    > head(anscombe_long)
    key x y
    1 1 10 8.04
    2 2 10 9.14
    3 3 10 7.46
    4 4 8 6.58
    5 1 8 6.95
    6 2 8 8.14
    ৹ඒతνϟωϧ Y࣠ Z࣠ ৭
    ʹରԠ͢Δม਺ʹͳΔΑ͏มܗ
    anscombe_long <-
    pivot_longer(data = anscombe,
    cols = everything(),
    names_to = c(".value",
    "key"),
    names_pattern = "(.)(.)")
    ԣ௕σʔλ
    ॎ௕σʔλ

    View Slide

  103. ggplot(data = anscombe_long) +
    aes(x = x, y = y, color = key) +
    geom_point()
    ggplot(data = anscombe_long) +
    aes(x = x, y = y, color = key) +
    geom_point() +
    facet_wrap(facets = . ~ key, nrow = 1)
    ਫ४ͰਤΛ෼ׂ͢Δ

    View Slide

  104. まとめ

    View Slide

  105. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View Slide

  106. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    preprocessing
    Data science
    Data
    Observa=on Hypothesis
    NarraFve of data
    feedback
    Data processing

    View Slide

  107. data.frame / *bble
    raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing

    View Slide

  108. raed_csv()
    write_csv()
    Table Data
    Wide form Long form
    pivot_longer()
    pivot_wider()
    Plot
    {ggplot2}
    Image Files
    ggsave()
    Data Processing
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    Long form
    data.frame / *bble

    View Slide

  109. It (dplyr) provides simple “verbs” to help
    you translate your thoughts into code.
    func?ons that correspond to the most
    common data manipula?on tasks
    Introduc6on to dplyr
    h"ps://cran.r-project.org/web/packages/dplyr/vigne"es/dplyr.html
    WFSCT {dplyr}

    View Slide

  110. 1. mutate()
    2. filter()
    3. select()
    4. group_by()
    5. summarize()
    6. left_join()
    7. arrange()
    Data.frame [email protected]

    View Slide

  111. import Tidy
    Transform
    Visualize
    Model
    Communicate
    Modified from “R for Data Science”, H. Wickham, 2017
    Data Science


    View Slide

  112. #
    $
    %!
    &!
    %"
    &"
    # $
    &!
    &"
    %!
    %"
    mapping
    x axis, y axis, color, fill,
    shape, linetype, alpha…
    aesthetic channels
    data
    ggplot2 package

    View Slide

  113. HHQMPUʹΑΔσʔλՄࢹԽ
    ࣮ଘ
    ࣸ૾ʢ؍࡯ʣ
    σʔλ
    ࣸ૾ʢσʔλՄࢹԽʣ
    άϥϑ
    !
    "
    #!
    $!
    #"
    $"
    SBXEBUB
    写像
    aesthetic channels
    ৹ඒతνϟωϧ
    ՄࢹԽʹదͨ͠EBUBܗࣜ
    変形
    ਤͷͭͷ৹ඒతνϟωϧ͕
    σʔλͷͭͷม਺ʹରԠ͍ͯ͠Δ

    View Slide

  114. Enjoy!!

    View Slide