Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
言語処理100本ノックをRubyでやったメモ
Search
himkt
August 06, 2016
11
2.6k
言語処理100本ノックをRubyでやったメモ
himkt
August 06, 2016
Tweet
Share
More Decks by himkt
See All by himkt
Linformer: paper reading
himkt
0
540
RoBERTa: paper reading
himkt
1
350
NLP SoTA 勉強会 / ner_2019
himkt
2
1.4k
自然言語処理 @ クックパッド / nlp at cookpad
himkt
1
530
Interpretable Machine Learning 6.3 - Prototypes and Criticisms
himkt
2
170
ニューラル固有表現抽出 / Neural Named Entity Recognition
himkt
3
750
ニューラル固有表現抽出器を実装してみる / PyNER
himkt
6
2.1k
Spacyでお手軽NLP / NLP with spacy
himkt
0
1k
Deep Learning Book 10その2 / deep learning book 10 vol2
himkt
2
200
Featured
See All Featured
Done Done
chrislema
185
16k
Documentation Writing (for coders)
carmenintech
75
5k
Bash Introduction
62gerente
615
210k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
45
2.5k
Context Engineering - Making Every Token Count
addyosmani
4
170
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
The Cost Of JavaScript in 2023
addyosmani
53
9k
Unsuck your backbone
ammeep
671
58k
RailsConf 2023
tenderlove
30
1.2k
How STYLIGHT went responsive
nonsquared
100
5.8k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Transcript
ݴޠॲཧ100ຊϊοΫΛRubyͰΔ ʢsciruby-jp issue #2ʣ
ࣗݾհͱͬͨ͜ͱ • B4 at ஜେֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿใநग़ʢ֬Ϟσϧʣ
• ୲ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock
ݴޠॲཧ100ຊϊοΫ • ౦େֶ סɾԬ࡚ݚ͕ެ։͍ͯ͠ΔࣗવݴޠॲཧυϦϧ • ఆ͞ΕΔݴޠPython • ୈ8ষʙୈ10ষ͕Պֶܭࢉతʁʢػցֶशʣͳ ʢը૾: http://www.cl.ecei.tohoku.ac.jp/nlp100/ʣ
RubyͰݴޠॲཧ100ຊϊοΫ • GitHubͳͲͰݕࡧ͢Δͱ… • RubyͰΖ͏ͱ͍ͯ͠Δਓ͍Δ • ͕ɼ4ষ͘Β͍·ͰͰߋ৽్͕ઈ͍͑ͯΔ ɹ • ఆݴޠɿPython
• RubyͰͰ͖ΔʁʢͰ͖ΔͩΖ͏ʣ -> ࣮ࡍʹղ͍ͯΈΔ ɹͰ͖ͳ͍͜ͱ͕ز͔ͭ͋Δ͜ͱ͕Θ͔ͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ɿͰ͖ͳ͔ͬͨ… • 99ɿt-SNE
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 6
ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔͷɿ୯ޠʢଟ͘ͷ߹ʣ • ग़ݱ͢Δ୯ޠͷͱͯଟ͍ʢສ - ेສʣ • ͯ͢ͷ୯ޠΛૉੑͱͯ͠͏ͱֶश͕͏·͍͔͘ͳ͍ •
ޮతͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦଘࡏ͠ͳ͍ • ࠓճ͓खʢhttps://github.com/himkt/rblearnʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE 8
ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ͏͜ͱ͕ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ͔ͳ͍ʁ*
• ࢥ͍ࠐΈ͔Εͳ͍ʢࠓճͷσʔλ10000 * 10000͘Β͍ʣ • NArrayͰ࣮ͨ͠ • ඞཁͳͷɿίετؔͱޯ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮Մʣ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ΫϩεόϦσʔγϣϯ • σʔληοτΛׂͯ͠ෳճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐΔ • Python: sklearn::cross_validation • ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ •
Integer array indexing (masking ?) • NArrayʹ͋Δ NMatrixʹͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html
ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়ͰϥΠϒϥϦଆͰ࣮͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python:
scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
ओੳ
ओੳ
ओੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ
• ݻ༗ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗ɾݻ༗ϕΫτϧܭࢉະ࣮ -> อཹ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟʣ •
NArrayͰ࣮ • word2vecϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕྑ͍ • ࣮ࡍʹඞཁͳͷϕΫτϧಉ࢜ͷίαΠϯྨࣅͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆʣ • NArrayͷ΄͏͕͔ͬͨͷͰNArrayΛͬͨ
ओͳτϐοΫ • 72ɿૉੑநग़ • 73ɿϩδεςΟοΫճؼ • 78ɿΫϩεόϦσʔγϣϯ • 85ɿओੳ •
90ɿword2vec • 97ɿk-means • 98ɿWard๏ • 99ɿt-SNE
k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ
• NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮ͨ͠ʢNArrayͷ΄͏͕͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘Ͱ͖Δ
·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨNArray, NMatrix͕͋Εղ͚Δ • େنͳσʔλͷओੳͱ͔Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ
• աڈϩάΛݟͨʢࡢʣ • ༗Εخ͍͠ʢRubyࣗવݴޠॲཧʹ͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ͕౷Ұతʹ͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔
΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix:
ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer