Upgrade to Pro — share decks privately, control downloads, hide ads and more …

修論発表.pdf

Hayato Tsukagoshi
September 29, 2024
30

 修論発表.pdf

修論発表会にて使用した発表スライドです。

Hayato Tsukagoshi

September 29, 2024
Tweet

Transcript

  1. จϕΫτϧ: ࣗવݴޠจͷϕΫτϧදݱ 5 จϕΫτϧۭؒ ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จಉ࢜ͷҙຯ͕͍ۙ 


    ϕΫτϧಉ͕͍࢜ۙ • ྨࣅจݕࡧ • ΫϥελϦϯά ɹͳͲ෯޿͍Ԡ༻ • ϕΫτϧͷ඼࣭޲্ • ϕΫτϧͷੑ࣭ཧղ ɹ͕༗༻ੑ޲্ʹ௚݁ จϕΫτϧʹ͍ͭͯͷཧղΛਂΊΔ
 ͜ͱ͕ࠓޙͷൃలͷͨΊʹॏཁ
  2. SBERT: ࣗવݴޠਪ࿦λεΫʹجͮ͘ख๏ • ࣗવݴޠਪ࿦λεΫͰ
 จϕΫτϧϞσϧΛ܇࿅͢Δख๏ • ࣗવݴޠਪ࿦λεΫ:
 จϖΞͷҙຯؔ܎Λ༧ଌ SBERTʹΑΔfine-tuningͷखॱ 0.

    ࣄલֶशࡁΈݴޠϞσϧΛ༻ҙ 1. จϖΞΛͦΕͧΕจϕΫτϧʹ 2. ಘΒΕͨจϕΫτϧͷϖΞ͔Β
 จϖΞͷҙຯؔ܎Λ༧ଌ 3. ਖ਼͍͠ҙຯؔ܎Λ༧ଌͰ͖Δ
 Α͏ʹϞσϧΛ܇࿅ 13 จB จA BERT BERT ໃ६ ؚҙ ͦͷଞ ϥϕϧ༧ଌ૚ Pooling Pooling
  3. DefSent: ఆٛจˠ୯ޠ༧ଌλεΫʹجͮ͘ख๏ • ఆٛจˠ୯ޠ༧ଌλεΫʹΑͬͯ
 จϕΫτϧϞσϧΛ܇࿅͢Δख๏ 14 ఆٛจ จB จA w

    |V| w1 w2 w3 ... BERT BERT ໃ६ ؚҙ ͦͷଞ ϥϕϧ༧ଌ૚ BERT ୯ޠ༧ଌ૚ Pooling Pooling Pooling
  4. DefSent: ఆٛจˠ୯ޠ༧ଌλεΫʹجͮ͘ख๏ • ఆٛจˠ୯ޠ༧ଌλεΫʹΑͬͯ
 จϕΫτϧϞσϧΛ܇࿅͢Δख๏ DefSentʹΑΔfine-tuningͷखॱ 0. ࣄલֶशࡁΈݴޠϞσϧΛ༻ҙ 1. ఆٛจΛBERTʹೖྗͯ͠


    จϕΫτϧΛ֫ಘ 2. ಘΒΕͨϕΫτϧ͔Βఆٛจ
 ʹରԠ͢Δ୯ޠΛ༧ଌ 3. ఆٛจ͕ද͢୯ޠͷ֬཰஋Λ
 ࠷େԽ͢ΔΑ͏ʹ܇࿅ 15 ఆٛจ จB จA w |V| w1 w2 w3 ... BERT BERT ໃ६ ؚҙ ͦͷଞ ϥϕϧ༧ଌ૚ BERT ୯ޠ༧ଌ૚ Pooling Pooling Pooling
  5. ຊݚڀͷ֓ཁ 17 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ จϖΞͷද૚తྨࣅ౓

    SentEval ᶅ ײ৘ɾ੍࣌෼ྨͳͲͷԼྲྀλεΫ ᶆ ݴޠֶత৘ใͷ෼ྨλεΫ ൺֱ •SBERT→DefSent •DefSent→SBERT •ϚϧνλεΫֶश •Average •Concat ౷߹ ڭࢣ৴߸ͷҧ͍ʹ
 ண໨ͨ͠จϕΫτϧͷ
 ൺֱɾ౷߹ SBERT DefSent BERT ؚҙؔ܎ೝࣝͰ fine-tuning ఆٛจ→୯ޠ
 ༧ଌͰfine-tuning จϕΫτϧ
 Ϟσϧ
  6. จϕΫτϧͷੑ࣭ൺֱ: STS 19 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ STSͷධՁखॱ ᶃ จϕΫτϧϞσϧΛ༻ҙ ᶄ จϖΞͦΕͧΕΛจϕΫτϧʹม׵ ᶅ จϕΫτϧͷϖΞͷྨࣅ౓Λܭࢉ ᶆ ਓؒධՁͱͷ૬ؔ܎਺Λܭࢉ จA จB จϕΫτϧϞσϧ ਓखධՁͱͷ
 ૬ؔ܎਺ͰධՁ จྨࣅ౓ ᶄ ᶃ ᶅ ᶆ
  7. จϕΫτϧͷੑ࣭ൺֱ: STS 20 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ
  8. จϕΫτϧͷੑ࣭ൺֱ: STS 21 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ • 2ͭͷ؍఺ͰσʔληοτΛ෼ׂ • ੑೳͷมԽΛ؍࡯
  9. จϕΫτϧͷੑ࣭ൺֱ: STS 22 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ • จͷιʔεʹΑͬͯੑೳʹࠩ • ֤ख๏ͷ܇࿅σʔληοτʹ͍ۙ จͷํ͕͏·͘ྨࣅ౓ΛଌΕΔ • 2ͭͷ؍఺ͰσʔληοτΛ෼ׂ • ੑೳͷมԽΛ؍࡯
  10. จϕΫτϧͷੑ࣭ൺֱ: STS 23 ද૚తྨࣅ౓ͱੑೳͷؔ܎ SBERT DefSent Semantic Textual Similarity (STS)

    ᶃ จͷιʔε ᶄ จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ • ද૚తྨࣅ౓ʹΑͬͯੑೳࠩ • SBERT (ؚҙؔ܎)͸ද૚తྨ ࣅ౓ͷӨڹΛड͚ͮΒ͍ • DefSent (ఆٛจ)͸ද૚తʹྨ ࣅ͍ͯ͠ͳ͍จͷྨࣅ౓Λ
 ൺֱతਖ਼͘͠ਪఆͰ͖Δ • จͷιʔεʹΑͬͯੑೳʹࠩ • ֤ख๏ͷ܇࿅σʔληοτʹ͍ۙ จͷํ͕͏·͘ྨࣅ౓ΛଌΕΔ • 2ͭͷ؍఺ͰσʔληοτΛ෼ׂ • ੑೳͷมԽΛ؍࡯
  11. จϕΫτϧͷੑ࣭ൺֱ: SentEval 24 SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ

    ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ จ ෼ྨੑೳ͔Β
 จຒΊࠐΈͷ඼࣭ΛධՁ จϕΫτϧϞσϧ ෼ྨث ᶄ ᶃ ᶅ ᶆ SentEvalͷධՁखॱ ᶃ จຒΊࠐΈϞσϧΛ༻ҙ ᶄ ֤จΛจϕΫτϧʹม׵ ᶅ จϕΫτϧΛೖྗͱ͢Δ෼ྨثΛ܇࿅ ᶆ ෼ྨثͷੑೳ͔ΒจϕΫτϧͷ඼࣭ΛධՁ
  12. จϕΫτϧͷੑ࣭ൺֱ: SentEval 26 SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ

    ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ • SBERT͸ҙຯతͳ৘ใΛ๛෋ ʹຒΊࠐΜͰ͍Δ • DefSent͸ද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ
  13. Length WordContent Tense SubjNumber จϕΫτϧͷੑ࣭ൺֱ: SentEval 27 SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ

    ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ • SBERT͸ҙຯతͳ৘ใΛ๛෋ ʹຒΊࠐΜͰ͍Δ • DefSent͸ද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ • DefSent͸੍࣌΍จத୯ޠͷ৘ใ
 ͳͲද૚తͳ৘ใ͕ൺֱత๛෋ 50 60 70 80 90 Length WordContent Tense จ௕༧ଌ 50 60 70 80 90 Length WordContent Tense จத୯ޠ༧ଌ ੍࣌༧ଌ • SBERT͸จͷද૚৘ใ͕ॏཁͳλεΫ ͷੑೳ͕௿͍ • จத୯ޠͳͲͷ৘ใ͸গͳΊ
  14. จϕΫτϧͷੑ࣭ൺֱ: ·ͱΊ 28 Semantic Textual Similarity (STS) ᶃ จͷιʔε ᶄ

    จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ
  15. จϕΫτϧͷੑ࣭ൺֱ: ·ͱΊ 29 ද૚తྨࣅ౓ͱੑೳͷؔ܎ SBERT DefSent Semantic Textual Similarity (STS)

    ᶃ จͷιʔε ᶄ จϖΞͷද૚తྨࣅ౓ จϖΞ (จϕΫτϧͷϖΞ) ೖྗ ൺֱ؍఺ ਓؒධՁͱจϕΫτϧಉ࢜ͷ
 ྨࣅ౓ͱͷॱҐ૬ؔ܎਺ ධՁࢦඪ SentEval ᶅ ԼྲྀλεΫ͝ͱͷੑೳ ᶆ ݴޠֶత৘ใͷ෼ྨੑೳ จϕΫτϧ ೖྗ ൺֱ؍఺ จϕΫτϧΛೖྗͱ͢Δ
 ઢܗ෼ྨثͷ෼ྨੑೳ ධՁࢦඪ SBERT • λεΫ: ࣗવݴޠਪ࿦ • ײ৘ۃੑͳͲҙຯత৘ใ͕๛෋ • ද૚త৘ใ͸গͳΊ DefSent • λεΫ: ఆٛจˠ୯ޠ༧ଌ • จ௕΍੍࣌ͳͲද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ
  16. จϕΫτϧͷ౷߹: ୯ҰϞσϧ΁ͷ౷߹ 33 • S+D: ୯ҰϞσϧʹSBERT, DefSentʹΑΔfine-tuningΛॱʹ࣮ࢪ • D+S: ୯ҰϞσϧʹDefSent,

    SBERTʹΑΔfine-tuningΛॱʹ࣮ࢪ • Multi: SBERTͱDefSentʹΑΔfine-tuningΛަޓʹ࣮ࢪ
  17. จϕΫτϧͷ౷߹: ୯ҰϞσϧ΁ͷ౷߹ 34 • S+D: ୯ҰϞσϧʹSBERT, DefSentʹΑΔfine-tuningΛॱʹ࣮ࢪ • D+S: ୯ҰϞσϧʹDefSent,

    SBERTʹΑΔfine-tuningΛॱʹ࣮ࢪ • Multi: SBERTͱDefSentʹΑΔfine-tuningΛަޓʹ࣮ࢪ
  18. จϕΫτϧͷ౷߹: ୯ҰϞσϧ΁ͷ౷߹ 35 • S+D: ୯ҰϞσϧʹSBERT, DefSentʹΑΔfine-tuningΛॱʹ࣮ࢪ • D+S: ୯ҰϞσϧʹDefSent,

    SBERTʹΑΔfine-tuningΛॱʹ࣮ࢪ • Multi: SBERTͱDefSentʹΑΔfine-tuningΛަޓʹ࣮ࢪ
  19. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 38 •౷߹ख๏͝ͱʹϞσϧΛ܇࿅ɾධՁ •STSͰ10ճ, SentEvalͰ3ճϞσϧΛ
 ܇࿅ͯ͠ฏۉੑೳΛใࠂ ࣮ݧઃఆ ධՁର৅ •SBERT

    •DefSent •S+D (SBERT→DefSent) •D+S (DefSent→SBERT) •Multi •Average •Concat ୯Ұख๏ͱੑೳΛൺֱ ධՁλεΫ •STS •SentEval
  20. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 39 BERT-base STS SentEval SBERT 73.19 86.49 DefSent

    75.20 86.61 S+D 78.45 86.80 D+S 72.89 86.09 Multi 72.89 86.23 Average 77.82 87.47 Concat 76.03 87.93 ֤౷߹ख๏͝ͱͷSTSͱSentEvalͷฏۉੑೳ (%) SBERT→DefSentͱ Average͕ߴੑೳ • SentEvalͰ͸Concatͷੑೳ͕
 ྑ͍͕ɺจϕΫτϧͷ࣍ݩ͕ େ͖͘༗རͳͷͰ஫ҙ
  21. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 40 BERT-base STS SentEval SBERT 73.19 86.49 DefSent

    75.20 86.61 S+D 78.45 86.80 D+S 72.89 86.09 Multi 72.89 86.23 Average 77.82 87.47 Concat 76.03 87.93 ֤౷߹ख๏͝ͱͷSTSͱSentEvalͷฏۉੑೳ (%) • ౷߹ख๏͕୯Ұख๏
 ΛԼճΔ৔߹΋ • ഁ໓త๨٫ͷӨڹ͔ SBERT→DefSentͱ Average͕ߴੑೳ DefSent→SBERT͸ ੑೳ͕ѱԽ • SentEvalͰ͸Concatͷੑೳ͕
 ྑ͍͕ɺจϕΫτϧͷ࣍ݩ͕ େ͖͘༗རͳͷͰ஫ҙ
  22. จϕΫτϧͷ౷߹: ධՁ࣮ݧ 41 BERT-base STS SentEval SBERT 73.19 86.49 DefSent

    75.20 86.61 S+D 78.45 86.80 D+S 72.89 86.09 Multi 72.89 86.23 Average 77.82 87.47 Concat 76.03 87.93 ֤౷߹ख๏͝ͱͷSTSͱSentEvalͷฏۉੑೳ (%) • SentEvalͰ͸Concatͷੑೳ͕
 ྑ͍͕ɺจϕΫτϧͷ࣍ݩ͕ େ͖͘༗རͳͷͰ஫ҙ • ౷߹ख๏͕୯Ұख๏
 ΛԼճΔ৔߹΋ • ഁ໓త๨٫ͷӨڹ͔ • จϕΫτϧͷ୯७ฏۉ͕Α͍ੑೳ • ౷߹ख๏ʹΑΔϕΫτϧͷੑ࣭෼ੳ ͸ࠓޙͷ՝୊ SBERT→DefSentͱ Average͕ߴੑೳ DefSent→SBERT͸ ੑೳ͕ѱԽ
  23. ·ͱΊɾࠓޙͷ՝୊ 42 ౷߹ ڭࢣ৴߸ͷҧ͍ʹண໨͠จϕΫτϧͷ
 ੑ࣭Λൺֱ෼ੳɾ౷߹ ൺֱ • จͷιʔεʹΑΔख๏͝ͱͷੑೳ͕ࠩݦஶ SBERT •

    ײ৘ۃੑͳͲҙຯత৘ใ͕๛෋ • ද૚తྨࣅ౓ͷӨڹΛड͚ͮΒ͍ DefSent • จ௕΍੍࣌ͳͲද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ • ද૚తྨࣅ౓͕௿͍จϖΞʹڧ͍ • ౷߹ʹΑͬͯੑೳ޲্ • SBERT→DefSent΍ Average͕ߴੑೳ • ഁ໓త๨٫ͷӨڹͰੑ ೳ͕௿Լ͢Δ৔߹΋ ෼ੳର৅ ɹSBERT: ࣗવݴޠਪ࿦ϕʔε ɹDefSent: ఆٛจ→୯ޠ༧ଌϕʔε ࠓޙͷ՝୊ 1. ΑΓ޿൚ͳϞσϧɾจϕΫτϧख๏ͷௐࠪ 2. ౷߹ख๏Ͱߏ੒͞ΕͨϕΫτϧͷੑ࣭෼ੳ 3. ΑΓΑ͍౷߹ख๏ͷ։ൃ
  24. ·ͱΊɾࠓޙͷ՝୊ 43 ౷߹ ڭࢣ৴߸ͷҧ͍ʹண໨͠จϕΫτϧͷ
 ੑ࣭Λൺֱ෼ੳɾ౷߹ ൺֱ • จͷιʔεʹΑΔख๏͝ͱͷੑೳ͕ࠩݦஶ SBERT •

    ײ৘ۃੑͳͲҙຯత৘ใ͕๛෋ • ද૚తྨࣅ౓ͷӨڹΛड͚ͮΒ͍ DefSent • จ௕΍੍࣌ͳͲද૚త৘ใ͕๛෋ • ϑϨʔζͷߏ੒΋ಘҙ • ද૚తྨࣅ౓͕௿͍จϖΞʹڧ͍ • ౷߹ʹΑͬͯੑೳ޲্ • SBERT→DefSent΍ Average͕ߴੑೳ • ഁ໓త๨٫ͷӨڹͰੑ ೳ͕௿Լ͢Δ৔߹΋ ෼ੳର৅ ɹSBERT: ࣗવݴޠਪ࿦ϕʔε ɹDefSent: ఆٛจ→୯ޠ༧ଌϕʔε ࠓޙͷ՝୊ 1. ΑΓ޿൚ͳϞσϧɾจϕΫτϧख๏ͷௐࠪ 2. ౷߹ख๏Ͱߏ੒͞ΕͨϕΫτϧͷੑ࣭෼ੳ 3. ΑΓΑ͍౷߹ख๏ͷ։ൃ
  25. ݚڀۀ੷ ࠃ಺࿦จࢽ (ࠪಡ͋Γ) • ௩ӽॣ, ࡫໺ྒྷฏ, ෢ాߒҰ. ఆٛจΛ༻͍ͨจຒΊࠐΈߏ੒๏, ࣗવݴޠॲཧ Vol.

    30 No. 1 (ൃߦ༧ఆ). ࠃࡍձٞ (ࠪಡ͋Γ) • Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda. Comparison and Combination of Sentence Embeddings Derived from Different Supervision Signals, in Proceedings of the 11th Joint Conference on Lexical and Computational Semantics (*SEM 2022). • Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda. DefSent: Sentence Embeddings using Definition Sentences, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021). ࠃ಺ձٞ (ࠪಡͳ͠) • ཅాᠳฏ, ௩ӽॣ, ࡫໺ྒྷฏ, ෢ాߒҰ. Ψ΢εຒΊࠐΈʹجͮ͘จදݱੜ੒, ݴޠॲཧֶձ ୈ29ճ೥࣍େձ (NLP2023) ൃද༧ఆ. • ௩ӽॣ, ฏඌ౒, ৿Լກ, ஽ࠤࠀݾ, ࡫໺ྒྷฏ, ෢ాߒҰ. ࣗવݴޠਪ࿦ͱ࠶ݱثΛ༻͍ͨSplit and Rephraseʹ ͓͚Δੜ੒จͷ඼࣭޲্, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (NLP2022). • ௩ӽॣ, ࡫໺ྒྷฏ, ෢ాߒҰ. ఆٛจΛ༻͍ͨจຒΊࠐΈߏ੒๏, ݴޠॲཧֶձ ୈ27ճ೥࣍େձ (NLP2021). ͦͷଞ • 2023೥౓ ೔ຊֶज़ৼڵձ ಛผݚڀһ-DC1 ࠾༻಺ఆ • 2023೥౓ ໊ݹ԰େֶ༥߹ϑϩϯςΟΞϑΣϩʔೝఆ 50