Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
言語処理100本ノックをRubyでやったメモ
Search
himkt
August 06, 2016
11
2.6k
言語処理100本ノックをRubyでやったメモ
himkt
August 06, 2016
Tweet
Share
More Decks by himkt
See All by himkt
Linformer: paper reading
himkt
0
560
RoBERTa: paper reading
himkt
1
370
NLP SoTA 勉強会 / ner_2019
himkt
2
1.4k
自然言語処理 @ クックパッド / nlp at cookpad
himkt
1
530
Interpretable Machine Learning 6.3 - Prototypes and Criticisms
himkt
2
170
ニューラル固有表現抽出 / Neural Named Entity Recognition
himkt
3
770
ニューラル固有表現抽出器を実装してみる / PyNER
himkt
6
2.2k
Spacyでお手軽NLP / NLP with spacy
himkt
0
1k
Deep Learning Book 10その2 / deep learning book 10 vol2
himkt
2
200
Featured
See All Featured
Typedesign – Prime Four
hannesfritz
42
2.9k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
Site-Speed That Sticks
csswizardry
13
940
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6k
How STYLIGHT went responsive
nonsquared
100
5.9k
BBQ
matthewcrist
89
9.9k
Context Engineering - Making Every Token Count
addyosmani
8
330
How to train your dragon (web standard)
notwaldorf
97
6.3k
Agile that works and the tools we love
rasmusluckow
331
21k
Transcript
ݴޠॲཧ100ຊϊοΫΛRubyͰΔ ʢsciruby-jp issue #2ʣ
ࣗݾհͱͬͨ͜ͱ • B4 at ஜେֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿใநग़ʢ֬Ϟσϧʣ
• ୲ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock
ݴޠॲཧ100ຊϊοΫ • ౦େֶ סɾԬ࡚ݚ͕ެ։͍ͯ͠ΔࣗવݴޠॲཧυϦϧ • ఆ͞ΕΔݴޠPython • ୈ8ষʙୈ10ষ͕Պֶܭࢉతʁʢػցֶशʣͳ ʢը૾: http://www.cl.ecei.tohoku.ac.jp/nlp100/ʣ
RubyͰݴޠॲཧ100ຊϊοΫ • GitHubͳͲͰݕࡧ͢Δͱ… • RubyͰΖ͏ͱ͍ͯ͠Δਓ͍Δ • ͕ɼ4ষ͘Β͍·ͰͰߋ৽్͕ઈ͍͑ͯΔ ɹ • ఆݴޠɿPython
• RubyͰͰ͖ΔʁʢͰ͖ΔͩΖ͏ʣ -> ࣮ࡍʹղ͍ͯΈΔ ɹͰ͖ͳ͍͜ͱ͕ز͔ͭ͋Δ͜ͱ͕Θ͔ͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ɿͰ͖ͳ͔ͬͨ… • 99ɿt-SNE
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 6
ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔͷɿ୯ޠʢଟ͘ͷ߹ʣ • ग़ݱ͢Δ୯ޠͷͱͯଟ͍ʢສ - ेສʣ • ͯ͢ͷ୯ޠΛૉੑͱͯ͠͏ͱֶश͕͏·͍͔͘ͳ͍ •
ޮతͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦଘࡏ͠ͳ͍ • ࠓճ͓खʢhttps://github.com/himkt/rblearnʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 8
ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ͏͜ͱ͕ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ͔ͳ͍ʁ*
• ࢥ͍ࠐΈ͔Εͳ͍ʢࠓճͷσʔλ10000 * 10000͘Β͍ʣ • NArrayͰ࣮ͨ͠ • ඞཁͳͷɿίετؔͱޯ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮Մʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ΫϩεόϦσʔγϣϯ • σʔληοτΛׂͯ͠ෳճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐΔ • Python: sklearn::cross_validation • ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ •
Integer array indexing (masking ?) • NArrayʹ͋Δ NMatrixʹͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html
ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়ͰϥΠϒϥϦଆͰ࣮͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python:
scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ओੳ
ओੳ
ओੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ
• ݻ༗ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗ɾݻ༗ϕΫτϧܭࢉະ࣮ -> อཹ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟʣ •
NArrayͰ࣮ • word2vecϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕྑ͍ • ࣮ࡍʹඞཁͳͷϕΫτϧಉ࢜ͷίαΠϯྨࣅͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆʣ • NArrayͷ΄͏͕͔ͬͨͷͰNArrayΛͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ
• NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮ͨ͠ʢNArrayͷ΄͏͕͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘Ͱ͖Δ
·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨNArray, NMatrix͕͋Εղ͚Δ • େنͳσʔλͷओੳͱ͔Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ
• աڈϩάΛݟͨʢࡢʣ • ༗Εخ͍͠ʢRubyࣗવݴޠॲཧʹ͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ͕౷Ұతʹ͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔
΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix:
ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer