Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
前処理勉強会_発表資料_MITTI_20210724.pdf
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
mitti1210
July 21, 2021
Science
1.4k
3
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
前処理勉強会_発表資料_MITTI_20210724.pdf
Rによるオープンデータ前処理勉強会(医療データ)の発表資料です。
勉強会はこちら
https://connpass.com/event/219249/
mitti1210
July 21, 2021
More Decks by mitti1210
See All by mitti1210
前処理R 第2回資料
mitti1210
0
420
Rによるオープンデータ 前処理勉強会(医療データ) _オープニング
mitti1210
3
2.1k
前処理をRでしたい! ~DPCデータに挑戦!~
mitti1210
2
240
Fukuoka.R #15 順序尺度の時系列変化を 折れ線グラフとヒートマップで 可視化してみた
mitti1210
1
13k
20190605_プログラム未経験者がMOOCでRを独学してみたら・・・
mitti1210
1
13k
Other Decks in Science
See All in Science
医療 LLM ベンチマークの現在地:多面的評価 と日本ローカライズ
analokmaus
1
510
見上公一.pdf
genomethica
0
150
データベース03: 関係データモデル
trycycle
PRO
1
550
白金鉱業Vol.21【初学者向け発表枠】身近な例から学ぶ数理最適化の基礎 / Learning the Basics of Mathematical Optimization Through Everyday Examples
brainpadpr
1
750
データベース05: SQL(2/3) 結合質問
trycycle
PRO
0
1.2k
AkarengaLT vol.41
hashimoto_kei
1
140
Kaggle: NeurIPS - Open Polymer Prediction 2025 コンペ 反省会
calpis10000
0
590
Endel Tulvingとエピソード記憶
rmaruy
0
140
「遂行理論の未来」(松島斉教授最終講義記念セッションの発表資料)
shunyanoda
0
920
Cross-Media Technologies, Information Science and Human-Information Interaction
signer
PRO
3
32k
[NLP2026 参加報告会] AI for Science まとめ / NLP2026
lychee1223
0
1.9k
Van Dare naar Durf
voginip
0
230
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
37
7.3k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
140
30 Presentation Tips
portentint
PRO
1
330
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
150
Designing for Performance
lara
611
70k
The Spectacular Lies of Maps
axbom
PRO
1
810
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
420
Into the Great Unknown - MozCon
thekraken
41
2.6k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
A better future with KSS
kneath
240
18k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
160
Transcript
લॲཧΛRͰͬͯΈͨʂ DPCσʔλʹઓʂ 1 2021/07/24 MITTI
ϑΥϧμͷ४උ .*55*ͷൃදͰҎԼͷߏͰϑΥϧμΛ࡞͍ͬͯ·͢ 3ͰͰ͖·͕࣌ؒ͢ͷ߹্ল͘ͷͰࣄલʹ࡞͢ΔͱਐΊ͍͢Ͱ͢ 2 ϓϩδΣΫτ ϑΥϧμ JOQVU TVNNBSZ .%$@@ PVUQVU
JOQVU .%$@@
μϯϩʔυ 3 ͜ͷ4 ࢪઃ֓ཁද ʢ̔ʣ࣬ױผखज़ผूܭ MDC07 ˎ2016,20172ͭʹ͔Ε͍ͯΔ MDC07ɿܗ֎Պ
μϯϩʔυͱอଘϑΝΠϧ໊ ͜ͷϑΥϧμ໊ͱϑΝΠϧ໊ͰਐΊ·͢ 4 ϓϩδΣΫτ ϑΥϧμ JOQVU TVNNBSZ .%$@@ PVUQVU TVNNBSZ
.%$@@ ʢ̔ʣ࣬ױผखज़ผूܭ MDC07 ࢪઃ֓ཁද
ࣄલ४උ 5 #ࠓճtidyverseύοέʔδΛ͍·͢ɻ #Πϯετʔϧͨ͜͠ͱͳ͍ํઌʹΠϯετʔϧ install.packages(“tidyverse") #ͲͪΒͷൃදͰ͍·͢ #Πϯετʔϧͨ͠ύοέʔδͷݺͼग़͠ library(tidyverse) #ࠓճಛʹdplyr,tidyr,stringr,purrrΛ༻ library(readxl)
#excelͷಡΈࠐΈʹ༻
ୀӃױऀௐࠪ 6 https://www.mhlw.go.jp/stf/shingi2/0000196043_00003.html
ख࡞ۀͰͰ͖Δʁ 7 ෳྻ໊ ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ
පӃ໊͕มΘΔ͜ͱ͕͋Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸
͜ΕͳΒੳʹ͔͚ͨΓάϥϑ͕࡞ΕΔ 8 ͰมΘΔ൪߸Λ ἧ͑ͨ ࢢொଜ໊͚ͨ ෳΛͭͳ͛ͨ ප໊ͰݕࡧͰ͖Δ ਓ͚ͩʹͨ͠
ࠓճͷલॲཧ ˔UJEZʢવσʔλʣʹ͢Δ ˔จࣈྻॲཧ ˔܁Γฦ͠ॲཧʹରԠ͢Δ ˔ؔͰ·ͱΊΔ 9
UJEZʢવσʔλʣͱʁ 8JLJQFEJB༷ΑΓ IUUQTKBXJLJQFEJBPSHXJLJ5JEZ@EBUB 10
σʔλͷܗɿUJEZS 11 ࡶવσʔλ %>% pivot_longer ( col = c("6࣌", "12࣌",
“18࣌”) , names_to = “࣌ࠁ” , values_to = “ఱؾ”)
UJEZʹ͢ΔͨΊͷઓ͍ 12 5JEZ /PU5JEZ
ࠓճͷલॲཧ ˔UJEZʢવσʔλʣʹ͢ΔɿUJEZS ˔จࣈྻॲཧɿTUSJOHS ˔܁Γฦ͠ॲཧʹରԠ͢ΔɿQVSSS ˔ؔͰ·ͱΊΔɿGVODUJPO ˔͚ͬͭ͘ΔɿKPJO 13
จࣈྻॲཧɿTUSJOHS 14 TUSJOHS දతͳؔ આ໌ ݕࡧ TUS@EFUFDU ͋Δจࣈྻؚ͕·Ε͍ͯͨΒ536& நग़ TUS@FYUSBDU
TUS@FYUSBDU@BMM TUS@TVC ͋ΔύλʔϯʹҰக͢ΔจࣈྻΛநग़ ࠷ॳʢ࠷ޙʣ˓จࣈΛநग़ ஔ͖͑ TUS@SFQMBDF TUS@SFQMBDF@BMM ύλʔϯʹϚονͨ͠ՕॴΛஔ Ճ TUS@D จࣈྻΛ͚ͬͭ͘Δ https://kazutan.github.io/kazutanR/stringr-intro.html ͕ৄ͍͠Ͱ͢ʂ
TUSJOHSͱਖ਼نදݱ 15 ྫʣ“MDC_02_8_07_2019.xlsx"ͱ͍͏ϑΝΠϧ໊͔Β৭ʑऔΓग़͍ͨ͠ str_extract(“จࣈྻ”, “݅” ) ݅ʹ৭ʑͳࢦఆ͕Ͱ͖Δ [ ] ͜ͷதʹ͋Δɾจࣈɾه߸
\\d \\D จࣈ . ͳΜͰO K * 0ճҎ্ͷ܁Γฦ͠ { } ׅހͷ͚ͩ܁Γฦ͠ ^ จࣈྻͷઌ಄ $ จࣈྻͷ࠷ޙ fi MFl.%$@@@@YMTY TUS@FYUSBDU fi MF z<aaE>\^z TUS@FYUSBDU fi MF z.%$ <aaE>\^z TUS@FYUSBDU fi MF l<>Y z 4ܻ MDCɹҙͷ܁Γฦ͠ɹ4ܻ .(ه߸ͷυοτ)ɹ xɹɹɹҙͷ܁Γฦ͠
˔UJEZʢવσʔλʣʹ͢Δ ˔จࣈྻॲཧ ˔܁Γฦ͠ॲཧʹରԠ͢Δ ˔ؔͰ·ͱΊΔ 16
17 ྫʣ&YDFMͷෳλϒΛҰؾʹಡΈࠐΉ ܁Γฦ͠QVSSS
͚ͬͭ͘ΔɿEQMZSKPJO 18 https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
˔UJEZʢવσʔλʣʹ͢Δ ˔܁Γฦ͠ॲཧʹରԠ͢Δ ˔จࣈྻॲཧ ˔ؔͰ·ͱΊΔ 19
ؔͰ·ͱΊΔϝϦοτͱϙΠϯτ ˔ෆཁͳม͕ݮΔ ˔ଞͷϑΝΠϧͰར༻Ͱ͖ΔɻNBQ͕͍͘͢ͳΔɻ 20 ؔͰΘΕΔม Βͳ͍ ϑΥϧμʹ͋ΔϑΝΠϧΛ ·ͱΊͯಡΈࠐΊΔ
ؔͰ·ͱΊΔϝϦοτͱϙΠϯτ ˔͋Δ্͕ख͍͔ͬͨ͘Βͱ͍ͬͯҧ͏Ͱ্ख͘ߦ͔֬͘ೝඞཁ ɹɾYMT͕YMTYʹมΘͬͯରԠͰ͖Δίʔυ ɹɾΤϥʔ͕ग़ͨΒͦͷؔ͠ʹܧ͍͗ͯ͘͠ 21
۩ମྫ 22
ࠂࣔ൪߸Λଗ͑ΔͨΊͷલॲཧ 23
σʔλΛ֬ೝ ɾཉ͍͠σʔλ͚ͩ͜͜ ɾ˞ͷϚʔΫ͍Βͳ͍ ɾͷྻΛ͍ͨ͠ ɾվߦ͍Βͳ͍ 24 ▶︎
͜Ε͚ͩͳΒίʔυ͔͚ͦ͏
Ͱྻ໊͕มΘ͍ͬͯͳ͍͔νΣοΫ 25 1ʙ3ྻͱ”ࢪઃ໊”ͷྻ͚ͩཉ͍͠ʂ select(1:3, “ࢪઃ໊”)
SFBE@FYDFMͰಡΈࠐΉ 26 read_excel ( path=ϑΝΠϧͷύε , sheet=γʔτͷ໊લ͔˓൪ͷγʔτ, #ࢦఆ͠ͳ͚Ε1ຕͷγʔτ skip=ઌ಄ʹ͍Βͳ͍ߦ͕͋Εࢦఆ ,
col_names=ྻ໊Λࢦఆ͍ͨ͠߹ʢFALSEʹ͢Δͱྻ໊ΛಡΈࠐ·ͳ͍ʣ , n_max=Կߦ·ͰಡΈࠐΉ͔ʁ ) ଓ͖σϞͰʂ
࡞ͬͨͷΛؔʹ͢Δ 27 ଓ͖σϞͰʂ
ࠂࣔ൪߸ͱ௨൪ͷؔ 28 ৽͍͠ͷ௨൪ͱݹ͍ͷࠂࣔ൪߸ͷࣈ͕߹͏Α͏ʹjoin͢Δ ৽͍͠ %>% left_join(ݹ͍, by = c(“௨൪” =
“ࠂࣔ൪߸”)
ʂ 29 ଓ͖σϞͰʂ
ࢢொଜ൪߸ ૯লͷσʔλ͔ΒࢢொଜίʔυͷϑΝΠϧ͕μϯϩʔυͰ͖Δ 30 https://www.soumu.go.jp/denshijiti/code.html
ୀӃױऀௐࠪ 31 https://www.mhlw.go.jp/stf/shingi2/0000196043_00003.html
·ͣྻ໊͔Β 32 ෳྻ໊ ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ
පӃ໊͕มΘΔ͜ͱ͕͋Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸
࡞ઓ ˔ྻ໊ͷΈΛಡΈࠐΈॎʹ͢Δ 33
࡞ઓ ˔UJEZS fi MM Ͱ/"Λ্ͷͰຒΊΔ 34
࡞ઓ ˔UJEZSVOJUF ͰෳྻΛͭʹ߹ମͤ͞Δ 35 ৽͍͠ྻ໊ ܨ͍͛ͨྻ ϕΫτϧʹม QVMM Ͱ0, ͜͜Ͱશ෦
࡞ઓ ˔TUS@SFQMBSF@BMMͰ/"ه߸Λফ͢ 36 ΈͰ͜ΕΒͷه߸εϖʔεΛফ͢͜ͱՄೳ ࠓճɻͱεϖʔεΛফͨ͠
ؔʹ͢Δ 37 ଓ͖σϞͰʂ
දه༳Εͷ֬ೝ 38 ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ
͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸
දه༳ΕΛ୳͢ ˔֤ͷྻ໊Λऔಘ ˔TFUEJ ff Ͱยํʹ͔͠ͳ͍ͷΛ୳͢ TFUEJ ff " #
▶︎ "ʹ͔͠ແ͍ͷΛநग़ TFUEJ ff # " ▶︎ #ʹ͔͠ແ͍ͷΛநग़ 39
40 ͱɺͱಉ͡ ͱʹҧ͍͕͋Εදه༳ΕͷՄೳੑ දه༳ΕͰͳ͘ ͦͦͦͷʹ͔͠ͳ͔ͬͨՄೳੑ
͜ΕͰσʔλ͕ಡΈࠐΊΔʢޙͰ͢ʣ 41 ଓ͖σϞͰʂ
͍Βͳ͍ྻ 42 ࡏӃ ͍Βͳ͍ ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ
ʢ༌݂Ҏ֎ͷ࠶ܝʣ Λআ͍͕ͨ߹ܭ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ
ࡏӃͱʢ࠶ܝʣͷྻ͍Βͳ͍ ˔TFMFDU ͱDPOUBJOT ͰߜΓࠐΊΔ 43
ϋΠϑϯΛফ͢ ▶︎ σʔλʹม 44 ϋΠϑϯ ͍Βͳ͍ ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ
͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
SFBESQBSTF@OVNCFS ˔SFBESQBSTF@OVNCFS ࢦఆͨ͠ྻΛσʔλʹ͢Δ ˔NVUBUF BDSPTT બͼ͍ͨྻ ؔ Ͱෳྻಉ࣌ʹॲཧ͢Δ 45
/"Λʹม͢Δ ˔SFQMBDF@OB ࢦఆͨ͠ྻ ม͍ͨ͠ ˔ؔʹҾ͕͋Δ߹ෳͷؔΛඞཁͱ͢Δ߹ ɹNVUBUF BDSPTT બͼ͍ͨྻ
dؔ Λ͏ 46
͜͜·ͰΛؔʹ͢Δ 47 ଓ͖σϞͰʂ
ࠂࣔ൪߸͕Ͱҧ͏ 48 ͕มΘΔͱ පӃ໊͕มΘΔ͜ͱ͕͋Δ ԣྻɺॎߦ ºؒ ͕มΘΔͱ ൪߸ҧ͏͜ͱ͕͋Δ ௨൪ʹલͷͷࠂࣔ൪߸ ͕มΘΔͱ
૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
ࢪઃ֓ཁදͱKPJO͢ΔʂDTWϑΝΠϧʹ͢Δ 49 ଓ͖σϞͰʂ
͜͜·ͰͰ͖Δͱ܁Γฦ͢͜ͱ͕Ͱ͖Δ 50 ԣྻɺॎߦ ºؒ ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
51 ྫʣ&YDFMͷෳλϒΛҰؾʹಡΈࠐΉ ܁Γฦ͠QVSSS
˔ϑΥϧμʹ͋ΔΤΫηϧϑΝΠϧΛ·ͱΊͯಡΈࠐΉ -JTU fi MFT ϑΥϧμ໊ NBQؔΛͬͯ܁Γฦ͢ 52 pattern=Ͱ݅ΛߜΔ ଓ͖σϞͰʂ
ྻҧ͏ 53 ͕มΘΔͱ ૿͑ͨΓݮͬͨΓ͢Δ ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
XJEFσʔλ 54 σʔλ͕ԣʹฒΜͰ͍Δ
MPOHσʔλͳΒؔͳ͍ʂ ˔UJEZͳΒྻ͕૿͑Δ͚ͩ 55
UJEZSQJWPU@MPOHFSͰॎ࣋ͪσʔλʹม QJWPU@MPOHFS ɹDPMT·ͱΊ͍ͨྻ ɹOBNFT@UP·ͱ·ͬͨޙʢྻ໊ʣͷྻ໊ ɹWBMVFT@UP·ͱ·ͬͨޙʢʣͷྻ໊ 56
QJWPU@MPOHFSͷOBNFT@TFQ OBNFT@TFQͰྻΛׂ͢Δ͜ͱ͕Ͱ͖Δ 57
5JEZ 58 ଓ͖σϞͰʂ
දه༳ΕΛ͢ 59 ͕มΘΔͱ ප໊มΘΔ͜ͱ͕͋Δ
දه༳ΕΛ͢ ˔UJEZʹͳ͔ͬͨΒNVUBUFͱJGFMTFͰ͙͢ʹͤΔ 60
61 ྫʣ&YDFMͷෳλϒΛҰؾʹಡΈࠐΉ ܁Γฦ͠QVSSS
࠷ޙʹશ෦ͷϑΝΠϧΛಡΈࠐΉ 62 ଓ͖σϞͰʂ
͓ർΕ༷Ͱͨ͠ʂʂʂ ྻߦͷσʔλʂʂ 63
՝ ࠓճ࠷৽ͷͷපӃͷ໊લʹ߹Θ͍ͤͯΔ ɹɾ࠷৽ͷʹͳ͍පӃল͔Ε͍ͯΔ ɹɾ߹ซͨ͠පӃରԠ͍ͯ͠ͳ͍ ▶︎ ʢࣗͷࣄͷൣғͩͱࠔΒͳ͍͕ʣ ɹɹݚڀͰ͏ͳΒͬͱݫີʹ͢Δඞཁ͕͋Δ͔͠Εͳ͍
▶︎ ɹ%1$Λѻ͏ํͷҙݟΛͬͯΈ͍ͨͰ͢ɻ 64
લॲཧ͕Ͱ͖Ε࣍ʹਐΊΔ ूܭɹɹɹɹɹɹɹɹɹɹɹɹɹɹ8&#ΞϓϦʢTIJOZʣ 65
TIJOZͳΒʂʂʂ 66
ऴΘΓʹ ˔&YDFMͷख࡞ۀͰͰ͖ͳ͍͜ͱ͕3QZUIPOͳΒͰ͖ͯ͠·͍·͢ ˔ҩྍ౷ܭ&YDFMͷ࡞Γํ͕ྑ͚Εલॲཧ΄΅ඞཁ͋Γ·ͤΜ ˔ͨͩ৭ʑͳσʔλΛੳͨ͘͠ͳΔͱલॲཧඞཁʹͳΓ·͢ ɹʢ༧ࢉͷׂҎ্લॲཧʹ͍ͬͯΔͱ͍͏ӟʜʣ ˔·ͩ·ͩஓͳίʔυଟ͍Ͱ͢ɻ ɹͲͷఔͷ࣮ྗ͔ʢશવμϝ ࠷ݶΫϦΞ
ଈઓྗʹͳΔPSOPUʣ ɹΞυόΠεɾϑΟʔυόοΫ͍͚ͨͩΔͱ͍Ͱ͢ʂ 67