Upgrade to Pro — share decks privately, control downloads, hide ads and more …

修論発表.pdf

Avatar for Hayato Tsukagoshi Hayato Tsukagoshi
September 29, 2024
82

 修論発表.pdf

修論発表会にて使用した発表スライドです。

Avatar for Hayato Tsukagoshi

Hayato Tsukagoshi

September 29, 2024
Tweet

More Decks by Hayato Tsukagoshi

Transcript

  1. จϕΫτϧ: ࣗવݴޠจͷϕΫτϧදݱ 5 จϕΫτϧۭؒ ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จಉ࢜ͷҙຯ͕͍ۙ 


    ϕΫτϧಉ͕͍࢜ۙ • ྨࣅจݕࡧ • ΫϥελϦϯά ɹͳͲ෯޿͍Ԡ༻ • ϕΫτϧͷ඼࣭޲্ • ϕΫτϧͷੑ࣭ཧղ ɹ͕༗༻ੑ޲্ʹ௚݁ จϕΫτϧʹ͍ͭͯͷཧղΛਂΊΔ
 ͜ͱ͕ࠓޙͷൃలͷͨΊʹॏཁ
  2. SBERT: ࣗવݴޠਪ࿦λεΫʹجͮ͘ख๏ • ࣗવݴޠਪ࿦λεΫͰ
 จϕΫτϧϞσϧΛ܇࿅͢Δख๏ • ࣗવݴޠਪ࿦λεΫ:
 จϖΞͷҙຯؔ܎Λ༧ଌ SBERTʹΑΔfine-tuningͷखॱ 0.

    ࣄલֶशࡁΈݴޠϞσϧΛ༻ҙ 1. จϖΞΛͦΕͧΕจϕΫτϧʹ 2. ಘΒΕͨจϕΫτϧͷϖΞ͔Β
 จϖΞͷҙຯؔ܎Λ༧ଌ 3. ਖ਼͍͠ҙຯؔ܎Λ༧ଌͰ͖Δ
 Α͏ʹϞσϧΛ܇࿅ 13 จB จA BERT BERT ໃ६ ؚҙ ͦͷଞ ϥϕϧ༧ଌ૚ Pooling Pooling
  3. DefSent: ఆٛจˠ୯ޠ༧ଌλεΫʹجͮ͘ख๏ • ఆٛจˠ୯ޠ༧ଌλεΫʹΑͬͯ
 จϕΫτϧϞσϧΛ܇࿅͢Δख๏ 14 ఆٛจ จB จA w

    |V| w1 w2 w3 ... BERT BERT ໃ६ ؚҙ ͦͷଞ ϥϕϧ༧ଌ૚ BERT ୯ޠ༧ଌ૚ Pooling Pooling Pooling
  4. DefSent: ఆٛจˠ୯ޠ༧ଌλεΫʹجͮ͘ख๏ • ఆٛจˠ୯ޠ༧ଌλεΫʹΑͬͯ
 จϕΫτϧϞσϧΛ܇࿅͢Δख๏ DefSentʹΑΔfine-tuningͷखॱ 0. ࣄલֶशࡁΈݴޠϞσϧΛ༻ҙ 1. ఆٛจΛBERTʹೖྗͯ͠


    จϕΫτϧΛ֫ಘ 2. ಘΒΕͨϕΫτϧ͔Βఆٛจ
 ʹରԠ͢Δ୯ޠΛ༧ଌ 3. ఆٛจ͕ද͢୯ޠͷ֬཰஋Λ
 ࠷େԽ͢ΔΑ͏ʹ܇࿅ 15 ఆٛจ จB จA w |V| w1 w2 w3 ... BERT BERT ໃ६ ؚҙ ͦͷଞ ϥϕϧ༧ଌ૚ BERT ୯ޠ༧ଌ૚ Pooling Pooling Pooling
  5. ຊݚڀͷ֓ཁ 17 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ จϖΞͷද૚తྨࣅ౓

    SentEval ᶅ ײ৘ɾ੍࣌෼ྨͳͲͷԼྲྀλεΫ ᶆ ݴޠֶత৘ใͷ෼ྨλεΫ ൺֱ •SBERT→DefSent •DefSent→SBERT •ϚϧνλεΫֶश •Average •Concat ౷߹ ڭࢣ৴߸ͷҧ͍ʹ
 ண໨ͨ͠จϕΫτϧͷ
 ൺֱɾ౷߹ SBERT DefSent BERT ؚҙؔ܎ೝࣝͰ fine-tuning ఆٛจ→୯ޠ
 ༧ଌͰfine-tuning จϕΫτϧ
 Ϟσϧ
  6. จϕΫτϧͷੑ࣭ൺֱ: STS 19 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ STSͷධՁखॱ ᶃ จϕΫτϧϞσϧΛ༻ҙ ᶄ จϖΞͦΕͧΕΛจϕΫτϧʹม׵ ᶅ จϕΫτϧͷϖΞͷྨࣅ౓Λܭࢉ ᶆ ਓؒධՁͱͷ૬ؔ܎਺Λܭࢉ จA จB จϕΫτϧϞσϧ ਓखධՁͱͷ
 ૬ؔ܎਺ͰධՁ จྨࣅ౓ ᶄ ᶃ ᶅ ᶆ
  7. จϕΫτϧͷੑ࣭ൺֱ: STS 20 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ
  8. จϕΫτϧͷੑ࣭ൺֱ: STS 21 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ • 2ͭͷ؍఺ͰσʔληοτΛ෼ׂ • ੑೳͷมԽΛ؍࡯
  9. จϕΫτϧͷੑ࣭ൺֱ: STS 22 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ • จͷιʔεʹΑͬͯੑೳʹࠩ • ֤ख๏ͷ܇࿅σʔληοτʹ͍ۙ จͷํ͕͏·͘ྨࣅ౓ΛଌΕΔ • 2ͭͷ؍఺ͰσʔληοτΛ෼ׂ • ੑೳͷมԽΛ؍࡯
  10. จϕΫτϧͷੑ࣭ൺֱ: STS 23 ද૚తྨࣅ౓ͱੑೳͷؔ܎ SBERT DefSent Semantic Textual Similarity (STS)

    ᶃ จͷιʔε ᶄ จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ • ද૚తྨࣅ౓ʹΑͬͯੑೳࠩ • SBERT (ؚҙؔ܎)͸ද૚తྨ ࣅ౓ͷӨڹΛड͚ͮΒ͍ • DefSent (ఆٛจ)͸ද૚తʹྨ ࣅ͍ͯ͠ͳ͍จͷྨࣅ౓Λ
 ൺֱతਖ਼͘͠ਪఆͰ͖Δ • จͷιʔεʹΑͬͯੑೳʹࠩ • ֤ख๏ͷ܇࿅σʔληοτʹ͍ۙ จͷํ͕͏·͘ྨࣅ౓ΛଌΕΔ • 2ͭͷ؍఺ͰσʔληοτΛ෼ׂ • ੑೳͷมԽΛ؍࡯
  11. จϕΫτϧͷੑ࣭ൺֱ: SentEval 24 SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ

    ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ จ ෼ྨੑೳ͔Β
 จຒΊࠐΈͷ඼࣭ΛධՁ จϕΫτϧϞσϧ ෼ྨث ᶄ ᶃ ᶅ ᶆ SentEvalͷධՁखॱ ᶃ จຒΊࠐΈϞσϧΛ༻ҙ ᶄ ֤จΛจϕΫτϧʹม׵ ᶅ จϕΫτϧΛೖྗͱ͢Δ෼ྨثΛ܇࿅ ᶆ ෼ྨثͷੑೳ͔ΒจϕΫτϧͷ඼࣭ΛධՁ
  12. จϕΫτϧͷੑ࣭ൺֱ: SentEval 26 SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ

    ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ • SBERT͸ҙຯతͳ৘ใΛ๛෋ ʹຒΊࠐΜͰ͍Δ • DefSent͸ද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ
  13. Length WordContent Tense SubjNumber จϕΫτϧͷੑ࣭ൺֱ: SentEval 27 SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ

    ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ • SBERT͸ҙຯతͳ৘ใΛ๛෋ ʹຒΊࠐΜͰ͍Δ • DefSent͸ද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ • DefSent͸੍࣌΍จத୯ޠͷ৘ใ
 ͳͲද૚తͳ৘ใ͕ൺֱత๛෋ 50 60 70 80 90 Length WordContent Tense จ௕༧ଌ 50 60 70 80 90 Length WordContent Tense จத୯ޠ༧ଌ ੍࣌༧ଌ • SBERT͸จͷද૚৘ใ͕ॏཁͳλεΫ ͷੑೳ͕௿͍ • จத୯ޠͳͲͷ৘ใ͸গͳΊ
  14. จϕΫτϧͷੑ࣭ൺֱ: ·ͱΊ 28 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ
  15. จϕΫτϧͷੑ࣭ൺֱ: ·ͱΊ 29 ද૚తྨࣅ౓ͱੑೳͷؔ܎ SBERT DefSent Semantic Textual Similarity (STS)

    ᶃ จͷιʔε ᶄ จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ SBERT • λεΫ: ࣗવݴޠਪ࿦ • ײ৘ۃੑͳͲҙຯత৘ใ͕๛෋ • ද૚త৘ใ͸গͳΊ DefSent • λεΫ: ఆٛจˠ୯ޠ༧ଌ • จ௕΍੍࣌ͳͲද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ
  16. จϕΫτϧͷ౷߹: ୯ҰϞσϧ΁ͷ౷߹ 33 • S+D: ୯ҰϞσϧʹSBERT, DefSentʹΑΔfine-tuningΛॱʹ࣮ࢪ • D+S: ୯ҰϞσϧʹDefSent,

    SBERTʹΑΔfine-tuningΛॱʹ࣮ࢪ • Multi: SBERTͱDefSentʹΑΔfine-tuningΛަޓʹ࣮ࢪ
  17. จϕΫτϧͷ౷߹: ୯ҰϞσϧ΁ͷ౷߹ 34 • S+D: ୯ҰϞσϧʹSBERT, DefSentʹΑΔfine-tuningΛॱʹ࣮ࢪ • D+S: ୯ҰϞσϧʹDefSent,

    SBERTʹΑΔfine-tuningΛॱʹ࣮ࢪ • Multi: SBERTͱDefSentʹΑΔfine-tuningΛަޓʹ࣮ࢪ
  18. จϕΫτϧͷ౷߹: ୯ҰϞσϧ΁ͷ౷߹ 35 • S+D: ୯ҰϞσϧʹSBERT, DefSentʹΑΔfine-tuningΛॱʹ࣮ࢪ • D+S: ୯ҰϞσϧʹDefSent,

    SBERTʹΑΔfine-tuningΛॱʹ࣮ࢪ • Multi: SBERTͱDefSentʹΑΔfine-tuningΛަޓʹ࣮ࢪ
  19. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 38 •౷߹ख๏͝ͱʹϞσϧΛ܇࿅ɾධՁ •STSͰ10ճ, SentEvalͰ3ճϞσϧΛ
 ܇࿅ͯ͠ฏۉੑೳΛใࠂ ࣮ݧઃఆ ධՁର৅ •SBERT

    •DefSent •S+D (SBERT→DefSent) •D+S (DefSent→SBERT) •Multi •Average •Concat ୯Ұख๏ͱੑೳΛൺֱ ධՁλεΫ •STS •SentEval
  20. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 39 BERT-base STS SentEval SBERT 73.19 86.49 DefSent

    75.20 86.61 S+D 78.45 86.80 D+S 72.89 86.09 Multi 72.89 86.23 Average 77.82 87.47 Concat 76.03 87.93 ֤౷߹ख๏͝ͱͷSTSͱSentEvalͷฏۉੑೳ (%) SBERT→DefSentͱ Average͕ߴੑೳ • SentEvalͰ͸Concatͷੑೳ͕
 ྑ͍͕ɺจϕΫτϧͷ࣍ݩ͕ େ͖͘༗རͳͷͰ஫ҙ
  21. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 40 BERT-base STS SentEval SBERT 73.19 86.49 DefSent

    75.20 86.61 S+D 78.45 86.80 D+S 72.89 86.09 Multi 72.89 86.23 Average 77.82 87.47 Concat 76.03 87.93 ֤౷߹ख๏͝ͱͷSTSͱSentEvalͷฏۉੑೳ (%) • ౷߹ख๏͕୯Ұख๏
 ΛԼճΔ৔߹΋ • ഁ໓త๨٫ͷӨڹ͔ SBERT→DefSentͱ Average͕ߴੑೳ DefSent→SBERT͸ ੑೳ͕ѱԽ • SentEvalͰ͸Concatͷੑೳ͕
 ྑ͍͕ɺจϕΫτϧͷ࣍ݩ͕ େ͖͘༗རͳͷͰ஫ҙ
  22. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 41 BERT-base STS SentEval SBERT 73.19 86.49 DefSent

    75.20 86.61 S+D 78.45 86.80 D+S 72.89 86.09 Multi 72.89 86.23 Average 77.82 87.47 Concat 76.03 87.93 ֤౷߹ख๏͝ͱͷSTSͱSentEvalͷฏۉੑೳ (%) • SentEvalͰ͸Concatͷੑೳ͕
 ྑ͍͕ɺจϕΫτϧͷ࣍ݩ͕ େ͖͘༗རͳͷͰ஫ҙ • ౷߹ख๏͕୯Ұख๏
 ΛԼճΔ৔߹΋ • ഁ໓త๨٫ͷӨڹ͔ • จϕΫτϧͷ୯७ฏۉ͕Α͍ੑೳ • ౷߹ख๏ʹΑΔϕΫτϧͷੑ࣭෼ੳ ͸ࠓޙͷ՝୊ SBERT→DefSentͱ Average͕ߴੑೳ DefSent→SBERT͸ ੑೳ͕ѱԽ
  23. ·ͱΊɾࠓޙͷ՝୊ 42 ౷߹ ڭࢣ৴߸ͷҧ͍ʹண໨͠จϕΫτϧͷ
 ੑ࣭Λൺֱ෼ੳɾ౷߹ ൺֱ • จͷιʔεʹΑΔख๏͝ͱͷੑೳ͕ࠩݦஶ SBERT •

    ײ৘ۃੑͳͲҙຯత৘ใ͕๛෋ • ද૚తྨࣅ౓ͷӨڹΛड͚ͮΒ͍ DefSent • จ௕΍੍࣌ͳͲද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ • ද૚తྨࣅ౓͕௿͍จϖΞʹڧ͍ • ౷߹ʹΑͬͯੑೳ޲্ • SBERT→DefSent΍ Average͕ߴੑೳ • ഁ໓త๨٫ͷӨڹͰੑ ೳ͕௿Լ͢Δ৔߹΋ ෼ੳର৅ ɹSBERT: ࣗવݴޠਪ࿦ϕʔε ɹDefSent: ఆٛจ→୯ޠ༧ଌϕʔε ࠓޙͷ՝୊ 1. ΑΓ޿൚ͳϞσϧɾจϕΫτϧख๏ͷௐࠪ 2. ౷߹ख๏Ͱߏ੒͞ΕͨϕΫτϧͷੑ࣭෼ੳ 3. ΑΓΑ͍౷߹ख๏ͷ։ൃ
  24. ·ͱΊɾࠓޙͷ՝୊ 43 ౷߹ ڭࢣ৴߸ͷҧ͍ʹண໨͠จϕΫτϧͷ
 ੑ࣭Λൺֱ෼ੳɾ౷߹ ൺֱ • จͷιʔεʹΑΔख๏͝ͱͷੑೳ͕ࠩݦஶ SBERT •

    ײ৘ۃੑͳͲҙຯత৘ใ͕๛෋ • ද૚తྨࣅ౓ͷӨڹΛड͚ͮΒ͍ DefSent • จ௕΍੍࣌ͳͲද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ • ද૚తྨࣅ౓͕௿͍จϖΞʹڧ͍ • ౷߹ʹΑͬͯੑೳ޲্ • SBERT→DefSent΍ Average͕ߴੑೳ • ഁ໓త๨٫ͷӨڹͰੑ ೳ͕௿Լ͢Δ৔߹΋ ෼ੳର৅ ɹSBERT: ࣗવݴޠਪ࿦ϕʔε ɹDefSent: ఆٛจ→୯ޠ༧ଌϕʔε ࠓޙͷ՝୊ 1. ΑΓ޿൚ͳϞσϧɾจϕΫτϧख๏ͷௐࠪ 2. ౷߹ख๏Ͱߏ੒͞ΕͨϕΫτϧͷੑ࣭෼ੳ 3. ΑΓΑ͍౷߹ख๏ͷ։ൃ
  25. ݚڀۀ੷ ࠃ಺࿦จࢽ (ࠪಡ͋Γ) • ௩ӽॣ, ࡫໺ྒྷฏ, ෢ాߒҰ. ఆٛจΛ༻͍ͨจຒΊࠐΈߏ੒๏, ࣗવݴޠॲཧ Vol.

    30 No. 1 (ൃߦ༧ఆ). ࠃࡍձٞ (ࠪಡ͋Γ) • Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda. Comparison and Combination of Sentence Embeddings Derived from Different Supervision Signals, in Proceedings of the 11th Joint Conference on Lexical and Computational Semantics (*SEM 2022). • Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda. DefSent: Sentence Embeddings using Definition Sentences, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021). ࠃ಺ձٞ (ࠪಡͳ͠) • ཅాᠳฏ, ௩ӽॣ, ࡫໺ྒྷฏ, ෢ాߒҰ. Ψ΢εຒΊࠐΈʹجͮ͘จදݱੜ੒, ݴޠॲཧֶձ ୈ29ճ೥࣍େձ (NLP2023) ൃද༧ఆ. • ௩ӽॣ, ฏඌ౒, ৿Լກ, ஽ࠤࠀݾ, ࡫໺ྒྷฏ, ෢ాߒҰ. ࣗવݴޠਪ࿦ͱ࠶ݱثΛ༻͍ͨSplit and Rephraseʹ ͓͚Δੜ੒จͷ඼࣭޲্, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (NLP2022). • ௩ӽॣ, ࡫໺ྒྷฏ, ෢ాߒҰ. ఆٛจΛ༻͍ͨจຒΊࠐΈߏ੒๏, ݴޠॲཧֶձ ୈ27ճ೥࣍େձ (NLP2021). ͦͷଞ • 2023೥౓ ೔ຊֶज़ৼڵձ ಛผݚڀһ-DC1 ࠾༻಺ఆ • 2023೥౓ ໊ݹ԰େֶ༥߹ϑϩϯςΟΞϑΣϩʔೝఆ 50