Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Go Climb a Dependency Tree and Correct the Gram...

Go Climb a Dependency Tree and Correct the Grammatical Errors

EMNLP 2014 読み会@首都大学東京で紹介した

Longkai Zhang and Houfeng Wang, Go Climb a Dependency Tree and Correct the Grammatical Errors, EMNLP 2014

のスライドです。

Avatar for Mamoru Komachi

Mamoru Komachi

December 03, 2014
Tweet

More Decks by Mamoru Komachi

Other Decks in Research

Transcript

  1. Go Climb a Dependency Tree and Correct the Grammatical Errors

    Longkai Zhang and Houfeng Wang, EMNLP 2014 ※εϥΠυதͷਤද͸શͯ࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक komachi@tmu.ac.jp EMNLP 2014 ಡΈձ@ट౎େֶ౦ژ 2014/12/04
  2. ӳޠֶशऀͷจ๏ޡΓగਖ਼ख๏ ͸֤Ϟσϧ͕ಠཱ͍ͯ͠Δ | CoNLL-2013 shared task { લஔࢺɾݶఆࢺɾಈࢺͷܗɾओޠಈࢺͷҰகɾ ໊ࢺͷ୯ෳͱ͍͏5ͭͷޡΓΛର৅ {

    NUCLEίʔύεʢγϯΨϙʔϧࠃཱେ͕࡞ͬͨ ӳޠֶशऀίʔύεʣ { m2scorerʢγϯΨϙʔϧࠃཱେͰ։ൃ͞Εͨ precision, recall, F஋Λग़͢είΞϥʣ | ୯ޠ͝ͱʹҟͳΔ౷ܭϞσϧͰޡΓగਖ਼Λߦͳ ͍ɺେҬతͳ૬ޓ࡞༻͸ߟྀ͍ͯ͠ͳ͍ →࣮ࡍ͸ޡΓಉ͕࢜૬͍ؔͯ͠Δ͜ͱ΋͋Δ 2
  3. େҬతͳ૬ޓ࡞༻Λߟྀ͢Δ Ϟσϧ͸ܭࢉྔ͕େ͖͍ | ޡΓΛؚΜͩϊΠδʔͳจ຺Λ༻͍ͯగਖ਼ { ߴ࣍ͷܥྻϥϕϦϯά (Gamon, 2011) { ϊΠδʔνϟωϧϞσϧʢPark

    and Levy, 2011) { ϏʔϜ୳ࡧ (Dahlmeier and Ng, 2012) | ੔਺ઢܗܭը๏ʢILPʣʹجͮ͘େҬత࠷దԽ { Wu and Ng (2013) ͦΕͧΕͷλΠϓͷޡΓʹ৴པ౓Λઃ͚ɺେҬ తʹ࠷దԽ { Rozovskaya and Roth (2013) ୯ޠͰ͸ͳ۟͘ߏ଄ͷగਖ਼ →࠷ѱ࣌ʹ͸ࢦ਺࣌ؒͷܭࢉྔ͕͔͔Δ 3
  4. that boy is on the ... ΛݟΔͱ ਖ਼ͦ͠͏͕࣮ͩ͸ओޠ͸ books 4

    (a) “The books of that boy is on the desk .” (b) lThe books of that boy are on the desk .” ʹର͢Δґଘߏ଄໦ ʢਤ1ʣ →ہॴతͳ৘ใ͔͠ݟͳ͍෼ྨثͰ͸గਖ਼Ͱ͖ͳ͍
  5. TreeNode ݴޠϞσϧ (TNLM) ͸จ୯ҐͷޡΓΛޮ཰తʹగਖ਼ ఏҊख๏: 2छྨͷϞσϧͷ૊Έ߹Θͤʢಛʹલऀʣ | general model {

    ಈࢺͷܗɾ໊ࢺͷ୯ෳɾओޠಈࢺͷҰகޡΓ { ґଘߏ଄ʹجͮ͘ݴޠϞσϧʹΑͬͯޡΓగਖ਼ →௕ڑ཭ͷґଘؔ܎΋௚઀ϞσϧԽ { จશମΛݟͯେҬతʹޡΓશͯΛߟྀ͢Δ͕ɺ ޮ཰తʢଟ߲ࣜ࣌ؒʣʹగਖ਼Մೳ | special model { ݶఆࢺɾલஔࢺͷޡΓ { େن໛ੜίʔύε͔Β࠷େΤϯτϩϐʔ๏ʹΑ Δ෼ྨثʹΑͬͯڭࢣ͋ΓֶशͰޡΓగਖ਼ 5
  6. General model ͸จΛґଘߏ ଄໦ͱͯ͠ѻ͍είΞϦϯά͢ Δ | CFGʹجͮ͘ݴޠϞσϧͷείΞؔ਺ L ݸͷੜ੒نଇ r

    ͷείΞͷੵͰείΞϦϯά ʢ͜͜Ͱ͸ґଘߏ଄จ๏Λߟ͑Δʣ ※֬཰Ҏ֎ͷ஋΋ѻ͑ΔΑ͏ɺ P(ri ) Ͱ͸ͳ͘ҰൠతͳείΞؔ਺ ͱ͍ͯ͠Δ 6 score(s) = score i=0 L ∏ (r i )
  7. TreeNode LM ʹΑΔ෦෼୯ޠ ྻΛ༻͍ͨจͷείΞϦϯά | “The car of my parents

    is damaged by the storm.” ʹର ͢Δ෦෼୯ޠྻʢਤ2ɺද3ʣ | ґଘߏ଄໦ n ʹର͠ɺKݸͷम ০෦Λ C1 , ..., CK ͱ͍͏ࢠϊʔ υͱͯ͠΋ͭͱ͖ɺSeq(n) = [C1 , ..., n, ..., CK ] ͱ͍͏୯ޠྻ ʹର͠ɺݴޠϞσϧʢtrigramʣ ͰείΞΛ͚ͭΔɻ 7
  8. TNLM͸గਖ਼ީิΛࣗಈੜ੒͠ ͯݴޠϞσϧʹΑΓީิΛબ୒ | TNLM ͷσίʔσΟϯά ci,j : ࢠϊʔυ Ci ͷ

    j ൪໨ͷగਖ਼ީิ | Viterbi Ͱޮ཰తʹ୳ࡧՄೳ | గਖ਼͢Δલͷ୯ޠͷॏΈΛ૿΍͢ύϥϝʔλͷ ಋೖʢdevset Ͱνϡʔχϯάʣ 9 score(seq) = TNLM(seq) C i . scores i=1 K ∏ [ j i ] seq =[c 1, j1 ,...,n i ,...,c K, jK ] n. scores[i]= maxscore(seq)
  9. Yoshimoto et al. (2013) ͱ ൺ΂ͯ TNLM ͸ޮ཰͕Α͍ | Yoshimoto

    et al. (2013) { ݴޠϞσϧ: Treelet language model (Pauls and Klein, 2012) →CoNLL 2013 shared task Ͱͷ݁Ռ͸ඍົɻ { Treelet language model ͸ෳࡶͳจ຺ʹجͮ͘ ΋ͷͰɺσʔλεύʔεωεͷӨڹ͕ਂࠁ | TreeNode Language ModelʢఏҊख๏ʣ { ༗༻ͳจ຺͔͠ߟྀ͠ͳ͍ͷͰσʔλεύʔε ωεͷ໰୊Λճආ { ܇࿅͢Δͷ͸ී௨ͷݴޠϞσϧΛ܇࿅͢Δͷͱ ಉఔ౓ͷ͔͔͔࣌ؒ͠Βͳ͍ 10
  10. general model ͷ͋ͱ special model ΛύΠϓϥΠϯͰ͔͚Δ | ݶఆࢺʢͱ͍ͬͯ΋ a/the/NONE ͷΈ͕ީิʣ

    ͱલஔࢺޡΓʢͱ͍ͬͯ΋ in/for/to/of/on ͷΈ ͕ީิʣΛ୲౰→ஔ׵ޡΓʴ࡟আޡΓ | ݶΒΕͨ෼ྔ͔͠ͳ͍ NUCLE ίʔύεͰ͸ͳ ͘ɺparsed Gigaword ίʔύεΛ༻ֶ͍ͯश | ڭࢣ͋ΓֶशʢNaive BayesɺฏۉԽύʔηϓ τϩϯɺSVMͱ࠷େΤϯτϩϐʔ๏Λൺֱͯ͠ɺ Ұ൪ਫ਼౓͕Α͔ͬͨMEʹͨ͠ʣ { ࢖ͬͨૉੑςϯϓϨʔτ͸ຊจද5,6Λࢀর | general model Ͱ͖Ε͍ʹͳͬͨೖྗΛѻ͑Δ 11
  11. ࣮ݧ݁Ռ: TNLM ͸ී௨ͷݴޠϞ σϧΑΓޡΓగਖ਼ޮՌ͕ߴ͍ 12 ද7: TNLM ͕ general model

    ૬౰ɻ+Detͱ+Prep͸ͦΕͧΕ special modelsɻ+Det+Prep͕࠷ऴγεςϜɻ ද8: TNLM ͱී௨ͷtrigramͷൺֱɻ܇࿅ɾςετσʔλ͸ ͦΕͧΕಉ͡ɻ →ී௨ͷݴޠϞσϧ͸trigramͰ͸ͳ͍͕……
  12. ࣮ݧ݁Ռ: TNLM ͕ CoNLL-2013 σʔλͰݱࡏੈք࠷ߴਫ਼౓ 13 ද11: CoNLL-2013 ͷಉ͡σʔλΛ࢖ͬͨݚڀͱͷൺֱɻ ఏҊख๏͕

    state-of-the-artɻ →Yoshimoto et al. (2013) ͸ Treelet LM ͚ͩͷ݁ՌͳͷͰɺ ԼͷදͰݴ͏ͱී௨ͷݴޠϞσϧͱTNLMͷؒʁ
  13. ·ͱΊ γϯϓϧͳ 5SFF/PEF ݴޠϞσϧ͸ޡΓగਖ਼ʹޮՌత | ґଘߏ଄໦ʹجͮ͘ݴޠϞσϧ TreeNode Language Model ΛఏҊͨ͠ɻ

    { ෆཁͳจ຺Λߟྀ͠ͳ͍ͷͰؤ݈ɻ { େҬతͳෳ਺ͷޡΓͷ૬ޓ࡞༻ΛߟྀͰ͖Δɻ { ܇࿅΋طଘͷݴޠϞσϧͱಉ౳Ͱɺσίʔυ΋ ViterbiΞϧΰϦζϜͰଟ߲ࣜ࣌ؒɻ { طଘͷݴޠϞσϧͱޓ׵ੑ͕͋Γɺ࣮૷͕؆୯ɻ | TreeNode Language Model Λ༻͍ͯ CoNLL- 2013 ӳޠֶशऀจ๏ޡΓగਖ਼ڞ௨λεΫͰ state-of-the-art ͷੑೳΛୡ੒ͨ͠ɻ 14