Upgrade to Pro — share decks privately, control downloads, hide ads and more …

言語処理100本ノックをRubyでやったメモ

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for himkt himkt
August 06, 2016
2.6k

 言語処理100本ノックをRubyでやったメモ

Avatar for himkt

himkt

August 06, 2016
Tweet

Transcript

  1. ࣗݾ঺հͱ΍ͬͨ͜ͱ • B4 at ஜ೾େֶ ʢࣗવݴޠॲཧ? ػցֶश? ʣ • ݚڀɿ৘ใநग़ʢ֬཰Ϟσϧʣ

    • ୲౰ɿݴޠॲཧ100ຊϊοΫΛRubyͰղ͍ͯΈΔ • ύοέʔδϢʔβ https://github.com/himkt/nlp-100knock
  2. ૉੑநग़ • ࣗવݴޠॲཧʹ͓͍ͯૉੑʹͳΔ΋ͷɿ୯ޠʢଟ͘ͷ৔߹ʣ • ग़ݱ͢Δ୯ޠͷ਺͸ͱͯ΋ଟ͍ʢ਺ສ - ਺ेສʣ • ͢΂ͯͷ୯ޠΛૉੑͱͯ͠࢖͏ͱֶश͕͏·͍͔͘ͳ͍ •

    ޮ཰తͳૉੑநग़͕ඞཁ • Python:scikit-learn::feature_extraction • Ruby:ܾఆ൛తͳϥΠϒϥϦ͸ଘࡏ͠ͳ͍ • ࠓճ͸͓ख੡ʢhttps://github.com/himkt/rblearnʣ
  3. ϩδεςΟοΫճؼ • ϥΠϒϥϦ • Statsample-glmɿDaruͱҰॹʹ࢖͏͜ͱ͕૝ఆ͞Ε͍ͯΔʁ • Liblinear-RubyɿNMatrix, NArrayʹରԠ͍ͯ͠ͳ͍ • σʔλϑϨʔϜɿΧϥϜ͕ଟ͍σʔλΛѻ͏ͷʹ޲͔ͳ͍ʁ*

    • ࢥ͍ࠐΈ͔΋஌Εͳ͍ʢࠓճͷσʔλ͸10000 * 10000͘Β͍ʣ • NArrayͰ࣮૷ͨ͠ • ඞཁͳ΋ͷɿίετؔ਺ͱޯ഑ • ߦྻͷੵͰදݱՄೳʢNArrayͷػೳ͚ͩͰ࣮૷Մʣ
  4. ΫϩεόϦσʔγϣϯ • σʔληοτΛ෼ׂͯ͠ෳ਺ճֶशΛߦ͏ ͜ͱͰ༧ଌϞσϧͷ൚ԽੑೳΛௐ΂Δ • Python: sklearn::cross_validation • ഑ྻͷΠϯσοΫεΛฦ͍ͯ͠Δ͚ͩ •

    Integer array indexing (masking ?) • NArrayʹ͸͋Δ NMatrixʹ͸ͳ͍ ը૾ɿhttps://pydata.tokyo/ipynb/tutorial-1/ml.html ࢀߟɿhttp://watanabe-www.math.dis.titech.ac.jp/users/swatanab/cross-val.html
  5. ΫϩεόϦσʔγϣϯ • Ruby: ݱঢ়Ͱ͸ϥΠϒϥϦଆͰ࣮૷͞Ε͍ͯͨΓ͢Δ • e.g. Liblinear.cross_validation (liblinear-ruby) • Python:

    scikit-learn::cross_validation • ϞσϧʢLogistic Regressionʣ͸܇࿅σʔλΛड͚औΓֶश͢Δ͚ͩ ΫϩεόϦσʔγϣϯ͢ΔϥΠϒϥϦΛ࡞ͬͨʢhttps://github.com/himkt/rblearnʣ ΫϩεόϦσʔγϣϯͱ ֶशͷϩδοΫ͕෼཭
  6. ओ੒෼෼ੳ • ϥΠϒϥϦ • Ruby: statsample • σʔλ͕Ͱ͔͍ͷͰɼૄߦྻͷ··ѻ͏ඞཁ͕͋Δ • DataFrameΛͭ͘Δඞཁ͕͋Δʁ

    • ݻ༗஋ɾݻ༗ϕΫτϧܭࢉͱͯ͠ղ͘ • NArray, NMatrixʢs.t. ૄߦྻʣ • NArray: ૄߦྻ͸·ͩରԠ͍ͯ͠ͳ͍ • NMatrix: ૄߦྻͷݻ༗஋ɾݻ༗ϕΫτϧܭࢉ͸ະ࣮૷ -> อཹ
  7. word2vec • ϥΠϒϥϦ • Python: gensim • Ruby: ແ͍ʢଟ෼ʣ •

    NArrayͰ࣮૷ • word2vec͸ϞσϧΛ܇࿅ͨ͠ޙʹ୯ޠϕΫτϧ͕ಘΒΕΕ͹ྑ͍ • ࣮ࡍʹඞཁͳͷ͸ϕΫτϧಉ࢜ͷίαΠϯྨࣅ౓ͷܭࢉ͚ͩ ʢNArray NMatrixͷػೳͰॆ෼ʣ • NArrayͷ΄͏͕଎͔ͬͨͷͰNArrayΛ࢖ͬͨ
  8. k-means t-SNE • ϥΠϒϥϦ • Python: sklearn.clustering • Ruby: AI4Rʢhttp://ai4r.org/ʣ

    • NArray NMatrixະରԠ • ߋ৽ࢭ·ͬͯΔʁ • NArray͚ͩͰ࣮૷ͨ͠ʢNArrayͷ΄͏͕଎͍ʣ • ಛʹ٧·Δ͜ͱͳ࣮͘૷Ͱ͖Δ
  9. ·ͱΊ • ݴޠॲཧ100ຊϊοΫΛղ͍ͯΈͨ • ͍͍ͩͨ͸NArray, NMatrix͕͋Ε͹ղ͚Δ • େن໛ͳσʔλͷओ੒෼෼ੳͱ͔͸Ͱ͖ͳ͍ • scikit-learnΈ͍ͨͳϥΠϒϥϦ͕ඞཁ͔ʁ

    • աڈϩάΛݟͨʢࡢ೔ʣ • ༗Ε͹خ͍͠ʢRuby͸ࣗવݴޠॲཧʹ޲͍͍ͯΔͱࢥ͏ʣ • ϥΠϒϥϦ: NArrayͳΓNMatrixͳΓDaruͷVector?ͳΓ
 ͳΜΒ͔ͷܾΊΒΕͨσʔλߏ଄͕౷Ұతʹ࢖͑ͯ΄͍͠ • ΫϩεόϦσʔγϣϯͱ͔ૉੑநग़ͱ͔
  10. ΄͍͠ • NArray: ૄߦྻରԠ • NMatrix: linalgͷૄߦྻରԠ • NArray, NMatrix:

    ΦϒδΣΫτͷγϦΞϥΠζ • NMatrix: Integer Array indexing • Feature Extractor, Feature Vectorizer