Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
前処理勉強会_発表資料_MITTI_20210724.pdf
Search
mitti1210
July 21, 2021
Science
3
1.3k
前処理勉強会_発表資料_MITTI_20210724.pdf
Rによるオープンデータ前処理勉強会(医療データ)の発表資料です。
勉強会はこちら
https://connpass.com/event/219249/
mitti1210
July 21, 2021
Tweet
Share
More Decks by mitti1210
See All by mitti1210
前処理R 第2回資料
mitti1210
0
390
Rによるオープンデータ 前処理勉強会(医療データ) _オープニング
mitti1210
3
2k
前処理をRでしたい! ~DPCデータに挑戦!~
mitti1210
2
210
Fukuoka.R #15 順序尺度の時系列変化を 折れ線グラフとヒートマップで 可視化してみた
mitti1210
1
12k
20190605_プログラム未経験者がMOOCでRを独学してみたら・・・
mitti1210
1
12k
Other Decks in Science
See All in Science
重複排除・高速バックアップ・ランサムウェア対策 三拍子そろったExaGrid × Veeam連携セミナー
climbteam
0
190
FOGBoston2024
lcolladotor
0
150
非同期コミュニケーションの構造 -チャットツールを用いた組織における情報の流れの設計について-
koisono
0
210
Analysis-Ready Cloud-Optimized Data for your community and the entire world with Pangeo-Forge
jbusecke
0
130
SciPyDataJapan 2025
schwalbe10
0
130
WCS-LA-2024
lcolladotor
0
180
All-in-One Bioinformatics Platform Realized with Snowflake ~ From In Silico Drug Discovery, Disease Variant Analysis, to Single-Cell RNA-seq
ktatsuya
0
300
[第62回 CV勉強会@関東] Long-CLIP: Unlocking the Long-Text Capability of CLIP / kantoCV 62th ECCV 2024
lychee1223
1
840
Iniciativas independentes de divulgação científica: o caso do Movimento #CiteMulheresNegras
taisso
0
930
学術講演会中央大学学員会大分支部
tagtag
0
120
Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)
yokotatsuya
0
110
山形とさくらんぼに関するレクチャー(YG-900)
07jp27
1
260
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.1k
Fontdeck: Realign not Redesign
paulrobertlloyd
83
5.4k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2.1k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.2k
Designing Experiences People Love
moore
140
23k
For a Future-Friendly Web
brad_frost
176
9.5k
4 Signs Your Business is Dying
shpigford
182
22k
A designer walks into a library…
pauljervisheath
205
24k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
The Invisible Side of Design
smashingmag
299
50k
Transcript
લॲཧΛRͰͬͯΈͨʂ DPCσʔλʹઓʂ 1 2021/07/24 MITTI
ϑΥϧμͷ४උ .*55*ͷൃදͰҎԼͷߏͰϑΥϧμΛ࡞͍ͬͯ·͢ 3ͰͰ͖·͕࣌ؒ͢ͷ߹্ল͘ͷͰࣄલʹ࡞͢ΔͱਐΊ͍͢Ͱ͢ 2 ϓϩδΣΫτ ϑΥϧμ JOQVU TVNNBSZ .%$@@ PVUQVU
JOQVU .%$@@
μϯϩʔυ 3 ͜ͷ4 ࢪઃ֓ཁද ʢ̔ʣ࣬ױผखज़ผूܭ MDC07 ˎ2016,20172ͭʹ͔Ε͍ͯΔ MDC07ɿܗ֎Պ
μϯϩʔυͱอଘϑΝΠϧ໊ ͜ͷϑΥϧμ໊ͱϑΝΠϧ໊ͰਐΊ·͢ 4 ϓϩδΣΫτ ϑΥϧμ JOQVU TVNNBSZ .%$@@ PVUQVU TVNNBSZ
.%$@@ ʢ̔ʣ࣬ױผखज़ผूܭ MDC07 ࢪઃ֓ཁද
ࣄલ४උ 5 #ࠓճtidyverseύοέʔδΛ͍·͢ɻ #Πϯετʔϧͨ͜͠ͱͳ͍ํઌʹΠϯετʔϧ install.packages(“tidyverse") #ͲͪΒͷൃදͰ͍·͢ #Πϯετʔϧͨ͠ύοέʔδͷݺͼग़͠ library(tidyverse) #ࠓճಛʹdplyr,tidyr,stringr,purrrΛ༻ library(readxl)
#excelͷಡΈࠐΈʹ༻
ୀӃױऀௐࠪ 6 https://www.mhlw.go.jp/stf/shingi2/0000196043_00003.html
ख࡞ۀͰͰ͖Δʁ 7 ෳྻ໊ ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ
පӃ໊͕มΘΔ͜ͱ͕͋Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸
͜ΕͳΒੳʹ͔͚ͨΓάϥϑ͕࡞ΕΔ 8 ͰมΘΔ൪߸Λ ἧ͑ͨ ࢢொଜ໊͚ͨ ෳΛͭͳ͛ͨ ප໊ͰݕࡧͰ͖Δ ਓ͚ͩʹͨ͠
ࠓճͷલॲཧ ˔UJEZʢવσʔλʣʹ͢Δ ˔จࣈྻॲཧ ˔܁Γฦ͠ॲཧʹରԠ͢Δ ˔ؔͰ·ͱΊΔ 9
UJEZʢવσʔλʣͱʁ 8JLJQFEJB༷ΑΓ IUUQTKBXJLJQFEJBPSHXJLJ5JEZ@EBUB 10
σʔλͷܗɿUJEZS 11 ࡶવσʔλ %>% pivot_longer ( col = c("6࣌", "12࣌",
“18࣌”) , names_to = “࣌ࠁ” , values_to = “ఱؾ”)
UJEZʹ͢ΔͨΊͷઓ͍ 12 5JEZ /PU5JEZ
ࠓճͷલॲཧ ˔UJEZʢવσʔλʣʹ͢ΔɿUJEZS ˔จࣈྻॲཧɿTUSJOHS ˔܁Γฦ͠ॲཧʹରԠ͢ΔɿQVSSS ˔ؔͰ·ͱΊΔɿGVODUJPO ˔͚ͬͭ͘ΔɿKPJO 13
จࣈྻॲཧɿTUSJOHS 14 TUSJOHS දతͳؔ આ໌ ݕࡧ TUS@EFUFDU ͋Δจࣈྻؚ͕·Ε͍ͯͨΒ536& நग़ TUS@FYUSBDU
TUS@FYUSBDU@BMM TUS@TVC ͋ΔύλʔϯʹҰக͢ΔจࣈྻΛநग़ ࠷ॳʢ࠷ޙʣ˓จࣈΛநग़ ஔ͖͑ TUS@SFQMBDF TUS@SFQMBDF@BMM ύλʔϯʹϚονͨ͠ՕॴΛஔ Ճ TUS@D จࣈྻΛ͚ͬͭ͘Δ https://kazutan.github.io/kazutanR/stringr-intro.html ͕ৄ͍͠Ͱ͢ʂ
TUSJOHSͱਖ਼نදݱ 15 ྫʣ“MDC_02_8_07_2019.xlsx"ͱ͍͏ϑΝΠϧ໊͔Β৭ʑऔΓग़͍ͨ͠ str_extract(“จࣈྻ”, “݅” ) ݅ʹ৭ʑͳࢦఆ͕Ͱ͖Δ [ ] ͜ͷதʹ͋Δɾจࣈɾه߸
\\d \\D จࣈ . ͳΜͰO K * 0ճҎ্ͷ܁Γฦ͠ { } ׅހͷ͚ͩ܁Γฦ͠ ^ จࣈྻͷઌ಄ $ จࣈྻͷ࠷ޙ fi MFl.%$@@@@YMTY TUS@FYUSBDU fi MF z<aaE>\^z TUS@FYUSBDU fi MF z.%$ <aaE>\^z TUS@FYUSBDU fi MF l<>Y z 4ܻ MDCɹҙͷ܁Γฦ͠ɹ4ܻ .(ه߸ͷυοτ)ɹ xɹɹɹҙͷ܁Γฦ͠
˔UJEZʢવσʔλʣʹ͢Δ ˔จࣈྻॲཧ ˔܁Γฦ͠ॲཧʹରԠ͢Δ ˔ؔͰ·ͱΊΔ 16
17 ྫʣ&YDFMͷෳλϒΛҰؾʹಡΈࠐΉ ܁Γฦ͠QVSSS
͚ͬͭ͘ΔɿEQMZSKPJO 18 https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
˔UJEZʢવσʔλʣʹ͢Δ ˔܁Γฦ͠ॲཧʹରԠ͢Δ ˔จࣈྻॲཧ ˔ؔͰ·ͱΊΔ 19
ؔͰ·ͱΊΔϝϦοτͱϙΠϯτ ˔ෆཁͳม͕ݮΔ ˔ଞͷϑΝΠϧͰར༻Ͱ͖ΔɻNBQ͕͍͘͢ͳΔɻ 20 ؔͰΘΕΔม Βͳ͍ ϑΥϧμʹ͋ΔϑΝΠϧΛ ·ͱΊͯಡΈࠐΊΔ
ؔͰ·ͱΊΔϝϦοτͱϙΠϯτ ˔͋Δ্͕ख͍͔ͬͨ͘Βͱ͍ͬͯҧ͏Ͱ্ख͘ߦ͔֬͘ೝඞཁ ɹɾYMT͕YMTYʹมΘͬͯରԠͰ͖Δίʔυ ɹɾΤϥʔ͕ग़ͨΒͦͷؔ͠ʹܧ͍͗ͯ͘͠ 21
۩ମྫ 22
ࠂࣔ൪߸Λଗ͑ΔͨΊͷલॲཧ 23
σʔλΛ֬ೝ ɾཉ͍͠σʔλ͚ͩ͜͜ ɾ˞ͷϚʔΫ͍Βͳ͍ ɾͷྻΛ͍ͨ͠ ɾվߦ͍Βͳ͍ 24 ▶︎
͜Ε͚ͩͳΒίʔυ͔͚ͦ͏
Ͱྻ໊͕มΘ͍ͬͯͳ͍͔νΣοΫ 25 1ʙ3ྻͱ”ࢪઃ໊”ͷྻ͚ͩཉ͍͠ʂ select(1:3, “ࢪઃ໊”)
SFBE@FYDFMͰಡΈࠐΉ 26 read_excel ( path=ϑΝΠϧͷύε , sheet=γʔτͷ໊લ͔˓൪ͷγʔτ, #ࢦఆ͠ͳ͚Ε1ຕͷγʔτ skip=ઌ಄ʹ͍Βͳ͍ߦ͕͋Εࢦఆ ,
col_names=ྻ໊Λࢦఆ͍ͨ͠߹ʢFALSEʹ͢Δͱྻ໊ΛಡΈࠐ·ͳ͍ʣ , n_max=Կߦ·ͰಡΈࠐΉ͔ʁ ) ଓ͖σϞͰʂ
࡞ͬͨͷΛؔʹ͢Δ 27 ଓ͖σϞͰʂ
ࠂࣔ൪߸ͱ௨൪ͷؔ 28 ৽͍͠ͷ௨൪ͱݹ͍ͷࠂࣔ൪߸ͷࣈ͕߹͏Α͏ʹjoin͢Δ ৽͍͠ %>% left_join(ݹ͍, by = c(“௨൪” =
“ࠂࣔ൪߸”)
ʂ 29 ଓ͖σϞͰʂ
ࢢொଜ൪߸ ૯লͷσʔλ͔ΒࢢொଜίʔυͷϑΝΠϧ͕μϯϩʔυͰ͖Δ 30 https://www.soumu.go.jp/denshijiti/code.html
ୀӃױऀௐࠪ 31 https://www.mhlw.go.jp/stf/shingi2/0000196043_00003.html
·ͣྻ໊͔Β 32 ෳྻ໊ ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ
පӃ໊͕มΘΔ͜ͱ͕͋Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸
࡞ઓ ˔ྻ໊ͷΈΛಡΈࠐΈॎʹ͢Δ 33
࡞ઓ ˔UJEZS fi MM Ͱ/"Λ্ͷͰຒΊΔ 34
࡞ઓ ˔UJEZSVOJUF ͰෳྻΛͭʹ߹ମͤ͞Δ 35 ৽͍͠ྻ໊ ܨ͍͛ͨྻ ϕΫτϧʹม QVMM Ͱ0, ͜͜Ͱશ෦
࡞ઓ ˔TUS@SFQMBSF@BMMͰ/"ه߸Λফ͢ 36 ΈͰ͜ΕΒͷه߸εϖʔεΛফ͢͜ͱՄೳ ࠓճɻͱεϖʔεΛফͨ͠
ؔʹ͢Δ 37 ଓ͖σϞͰʂ
දه༳Εͷ֬ೝ 38 ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ
͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸
දه༳ΕΛ୳͢ ˔֤ͷྻ໊Λऔಘ ˔TFUEJ ff Ͱยํʹ͔͠ͳ͍ͷΛ୳͢ TFUEJ ff " #
▶︎ "ʹ͔͠ແ͍ͷΛநग़ TFUEJ ff # " ▶︎ #ʹ͔͠ແ͍ͷΛநग़ 39
40 ͱɺͱಉ͡ ͱʹҧ͍͕͋Εදه༳ΕͷՄೳੑ දه༳ΕͰͳ͘ ͦͦͦͷʹ͔͠ͳ͔ͬͨՄೳੑ
͜ΕͰσʔλ͕ಡΈࠐΊΔʢޙͰ͢ʣ 41 ଓ͖σϞͰʂ
͍Βͳ͍ྻ 42 ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ
ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ
ࡏӃͱʢ࠶ܝʣͷྻ͍Βͳ͍ ˔TFMFDU ͱDPOUBJOT ͰߜΓࠐΊΔ 43
ϋΠϑϯΛফ͢ ▶︎ σʔλʹม 44 ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ
͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
SFBESQBSTF@OVNCFS ˔SFBESQBSTF@OVNCFS ࢦఆͨ͠ྻΛσʔλʹ͢Δ ˔NVUBUF BDSPTT બͼ͍ͨྻ ؔ Ͱෳྻಉ࣌ʹॲཧ͢Δ 45
/"Λʹม͢Δ ˔SFQMBDF@OB ࢦఆͨ͠ྻ ม͍ͨ͠ ˔ؔʹҾ͕͋Δ߹ෳͷؔΛඞཁͱ͢Δ߹ ɹNVUBUF BDSPTT બͼ͍ͨྻ
dؔ Λ͏ 46
͜͜·ͰΛؔʹ͢Δ 47 ଓ͖σϞͰʂ
ࠂࣔ൪߸͕Ͱҧ͏ 48 ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸ ͕มΘΔͱ
૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
ࢪઃ֓ཁදͱKPJO͢ΔʂDTWϑΝΠϧʹ͢Δ 49 ଓ͖σϞͰʂ
͜͜·ͰͰ͖Δͱ܁Γฦ͢͜ͱ͕Ͱ͖Δ 50 ԣྻɺॎߦ ºؒ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
51 ྫʣ&YDFMͷෳλϒΛҰؾʹಡΈࠐΉ ܁Γฦ͠QVSSS
˔ϑΥϧμʹ͋ΔΤΫηϧϑΝΠϧΛ·ͱΊͯಡΈࠐΉ -JTU fi MFT ϑΥϧμ໊ NBQؔΛͬͯ܁Γฦ͢ 52 pattern=Ͱ݅ΛߜΔ ଓ͖σϞͰʂ
ྻҧ͏ 53 ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
XJEFσʔλ 54 σʔλ͕ԣʹฒΜͰ͍Δ
MPOHσʔλͳΒؔͳ͍ʂ ˔UJEZͳΒྻ͕૿͑Δ͚ͩ 55
UJEZSQJWPU@MPOHFSͰॎ࣋ͪσʔλʹม QJWPU@MPOHFS ɹDPMT·ͱΊ͍ͨྻ ɹOBNFT@UP·ͱ·ͬͨޙʢྻ໊ʣͷྻ໊ ɹWBMVFT@UP·ͱ·ͬͨޙʢʣͷྻ໊ 56
QJWPU@MPOHFSͷOBNFT@TFQ OBNFT@TFQͰྻΛׂ͢Δ͜ͱ͕Ͱ͖Δ 57
5JEZ 58 ଓ͖σϞͰʂ
දه༳ΕΛ͢ 59 ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
දه༳ΕΛ͢ ˔UJEZʹͳ͔ͬͨΒNVUBUFͱJGFMTFͰ͙͢ʹͤΔ 60
61 ྫʣ&YDFMͷෳλϒΛҰؾʹಡΈࠐΉ ܁Γฦ͠QVSSS
࠷ޙʹશ෦ͷϑΝΠϧΛಡΈࠐΉ 62 ଓ͖σϞͰʂ
͓ർΕ༷Ͱͨ͠ʂʂʂ ྻߦͷσʔλʂʂ 63
՝ ࠓճ࠷৽ͷͷපӃͷ໊લʹ߹Θ͍ͤͯΔ ɹɾ࠷৽ͷʹͳ͍පӃল͔Ε͍ͯΔ ɹɾ߹ซͨ͠පӃରԠ͍ͯ͠ͳ͍ ▶︎ ʢࣗͷࣄͷൣғͩͱࠔΒͳ͍͕ʣ ɹɹݚڀͰ͏ͳΒͬͱݫີʹ͢Δඞཁ͕͋Δ͔͠Εͳ͍
▶︎ ɹ%1$Λѻ͏ํͷҙݟΛͬͯΈ͍ͨͰ͢ɻ 64
લॲཧ͕Ͱ͖Ε࣍ʹਐΊΔ ूܭɹɹɹɹɹɹɹɹɹɹɹɹɹɹ8&#ΞϓϦʢTIJOZʣ 65
TIJOZͳΒʂʂʂ 66
ऴΘΓʹ ˔&YDFMͷख࡞ۀͰͰ͖ͳ͍͜ͱ͕3QZUIPOͳΒͰ͖ͯ͠·͍·͢ ˔ҩྍ౷ܭ&YDFMͷ࡞Γํ͕ྑ͚Εલॲཧ΄΅ඞཁ͋Γ·ͤΜ ˔ͨͩ৭ʑͳσʔλΛੳͨ͘͠ͳΔͱલॲཧඞཁʹͳΓ·͢ ɹʢ༧ࢉͷׂҎ্લॲཧʹ͍ͬͯΔͱ͍͏ӟʜʣ ˔·ͩ·ͩஓͳίʔυଟ͍Ͱ͢ɻ ɹͲͷఔͷ࣮ྗ͔ʢશવμϝ ࠷ݶΫϦΞ
ଈઓྗʹͳΔPSOPUʣ ɹΞυόΠεɾϑΟʔυόοΫ͍͚ͨͩΔͱ͍Ͱ͢ʂ 67