Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Language-agnostic BERT Sentence Embedding

[輪講資料] Language-agnostic BERT Sentence Embedding

多言語文埋め込み手法であるLanguage-agnostic BERT
 Sentence Embedding (LaBSE)の論文について解説した資料です。

Hayato Tsukagoshi

May 24, 2022
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Language-agnostic BERT
 Sentence Embedding M2, Graduate School of Informatics, Nagoya

    University, Japan ൃදऀ: ௩ӽॣ / Hayato TSUKAGOSHI Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang ACL 2022 URL: https://arxiv.org/abs/2007.01852
  2. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive

    Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ ࿦จબఆཧ༝ •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋ • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ LaBSE: Language-agnostic BERT Sentence Embedding 2 https://arxiv.org/abs/2007.01852
  3. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 4 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ

    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]
  4. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 5 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ

    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]
  5. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 6 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ

    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...] ҙຯతʹྨࣅ ͍ۙҙຯΛ࣋ͭจ͸ ۙ͘ʹ෼෍ ϕΫτϧؒͷڑ཭͕
 ҙຯతͳؔ܎Λදݱ
  6. •ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢ • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠ ༗༻ੑɾԠ༻ઌ

    •ྨࣅจ(ॻ)ݕࡧɾΫϥελϦϯά •ܰྔͳจ(ॻ)෼ྨɾಛ௃நग़ (ଞλεΫͰԉ༻) •ີϕΫτϧݕࡧ (Dense Passage Retrieval)ʹΑΔ࣭໰Ԡ౴ •຋༁ϝϞϦΛ༻͍ͨࣄྫϕʔε຋༁ɺࣄྫϕʔεػցֶश •Ԡ༻͸ඇৗʹ޿ൣ ಋೖ: จຒΊࠐΈ / Sentence embedding 8
  7. BERTҎલ •୯ޠຒΊࠐΈͷ(ॏΈ෇͚)ฏۉͳͲͰจຒΊࠐΈΛ֫ಘ • p-mean [01], SWEM [02], DynaMax [03], SIF

    [04], uSIF [05], etc. •จຒΊࠐΈઐ༻ͷϞσϧΛߏங • Skip-Thought [06], SCDV [07], InferSent [08], USE [09], etc. [01] Ru ̈ckle ́+: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations, arXiv ’18 [02] Shen+: Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, ACL ’18 [03] Zhelezniak+: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR ’19 [04] Arora+: A Simple but Tough-to-Beat Baseline for Sentence Embeddings, ICLR '17 [05] Ethayarajh: Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline, Rep4NLP ’18 [06] Kiros+: Skip-Thought Vectors, NIPS ’15 [07] Mekala+: SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations, ACL ’17 [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP '17 [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 9
  8. Sentence-BERT [10] •ࣗવݴޠਪ࿦ (Natural Language Inference; NLI)
 ͰBERTΛ fi ne-tuning

    • ࣄલֶशࡁΈݴޠϞσϧ (Pre-trained 
 Language Model; PLM) Λ༻͍ͨ
 จຒΊࠐΈϞσϧͷ૲෼͚తݚڀ • BERTͰInferSentΛ΍Δ •౰࣌େ෯ʹSOTA (state-of-the-art) Λߋ৽ • ޙड़͢ΔSimCSE͕΄΅্Ґޓ׵ʹͳͬͯ
 ͠·ͬͨͷͰɺࠓޙ͸͋·Γ࢖ΘΕͳͦ͞͏? [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP '19 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 10 ਤ͸౰֘࿦จΑΓҾ༻
  9. SimCSE [11] •ରরֶश(Contrastive Learning)Λ༻͍ͯBERTΛ fi ne-tuning • Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ •

    Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ •Ϳͬͪ͗ΓͷSOTAɺ೿ੜݚڀ΋ଓʑ [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 11 ਤ͸౰֘࿦จΑΓҾ༻ɻҎલ࣮ࢪͨ͠SimCSEͷྠߨࢿྉ͸ͪ͜Β
  10. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity

    (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 12
  11. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity

    (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 13
  12. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity

    (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 14 STSͱSentEval͕࠷΋Α͘࢖ΘΕΔ
  13. ڭࢣͳ͠STSͷධՁखॱ ᶃ จϖΞσʔληοτΛ༻ҙ ᶄ จຒΊࠐΈϞσϧΛ༻ҙ ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ ᶆ จ“ϕΫτϧ”ϖΞͷྨࣅ౓Λܭࢉ •

    ίαΠϯྨࣅ౓͕Α͘༻͍ΒΕΔ ᶇ ਓؒධՁͱͷ(ॱҐ)૬ؔ܎਺Λܭࢉ ಋೖ: ڭࢣͳ͠ (Unsupervised) STS 19 จຒΊࠐΈϞσϧ จA จB ᶄ ᶅ ਓखධՁ
 ʹΑΔྨࣅ౓ ᶆ Ϟσϧ
 ʹΑΔྨࣅ౓ ᶇ ૬ؔ܎਺ ᶃ
  14. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 20 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
  15. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 21 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
  16. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 22 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5
  17. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 23 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 จϖΞͷྨࣅ౓ॱҐ
  18. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 24 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0)
  19. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 25 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) ਖ਼ղͷॱҐͱ༧ଌʹΑΔ
 ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)
  20. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 26 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 ਖ਼ղͷॱҐͱ༧ଌʹΑΔ
 ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)
  21. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 27 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 Model 2ͷ΄͏͕༏Ε͍ͯΔ
  22. ӳޠσʔληοτ •STS12, 13, 14, 15, 16 [16, 17, 18, 19,

    20] •STS Benchmark (test set) [21] •SICK-R [22] ೔ຊޠσʔληοτ •JSICK [23] •JSTS [24] [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12 [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13 [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14 [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15 [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16 [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17 [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14 [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021) [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022) ಋೖ: STSͷධՁ༻σʔληοτ 28
  23. ӳޠσʔληοτ •STS12, 13, 14, 15, 16 [16, 17, 18, 19,

    20] •STS Benchmark (test set) [21] •SICK-R [22] ೔ຊޠσʔληοτ •JSICK [23] •JSTS [24] [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12 [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13 [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14 [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15 [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16 [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17 [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14 [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021) [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022) ಋೖ: STSͷධՁ༻σʔληοτ 29 STS12-16͸ͦΕͧΕখ͍͞σʔληοτͷू߹ ௨ৗɺ“αϒ”σʔληοτΛࠞͥͯ૬ؔ܎਺Λܭࢉ STS12-16, STS Benchmark, SICK-RͷείΞͷ
 ฏۉͰ࠷ऴతͳධՁ͕͞ΕΔ͜ͱ͕ଟ͍
  24. •ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ [25] Conneau+: SentEval: An Evaluation Toolkit

    for Universal Sentence Representations, LREC ‘18 ಋೖ: SentEval [25] 30 Task Type #train #test #class MR movie review 11,000 11,000 2 CR product review 4,000 4,000 2 SUBJ subjectivity status 10,000 10,000 2 MPQA opinion-polarity 11,000 11,000 2 SST-2 binary sentiment analysis 67,000 1,800 2 TREC question-type classi fi cation 6,000 500 6 MRPC paraphrase detection 4,100 1,700 2 λεΫҰཡ
  25. •ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ [25] Conneau+: SentEval: An Evaluation Toolkit

    for Universal Sentence Representations, LREC ‘18 ಋೖ: SentEval [25] 31 Task Type #train #test #class MR movie review 11,000 11,000 2 CR product review 4,000 4,000 2 SUBJ subjectivity status 10,000 10,000 2 MPQA opinion-polarity 11,000 11,000 2 SST-2 binary sentiment analysis 67,000 1,800 2 TREC question-type classi fi cation 6,000 500 6 MRPC paraphrase detection 4,100 1,700 2 λεΫҰཡ
  26. SentEvalͷධՁखॱ ᶃ ύϥϝʔλΛݻఆͨ͠
 จຒΊࠐΈϞσϧΛ༻ҙ ᶄ จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ ᶅ ෼ྨثͷੑೳ͔ΒจຒΊࠐΈ
 ͷ඼࣭ΛධՁ •෼ྨੑೳ͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ”ͱ͍͏Ծఆ

    •෼ྨث͸ϩδεςΟοΫճؼ෼ྨث͕ଟ͍ • i.e. จຒΊࠐΈͷ֤࣍ݩͷॏΈ෇͖࿨Ͱ෼ྨ •“ࣄલ܇࿅ࡁΈͷ”จຒΊࠐΈϞσϧͷੑೳΛධՁ ಋೖ: SentEval 32 จຒΊࠐΈϞσϧ จ ᶄ ᶃ ෼ྨث ෼ྨੑೳ͔Β
 จຒΊࠐΈͷ඼࣭ΛධՁ ᶅ
  27. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive

    Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ ࿦จબఆཧ༝ •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋ • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ LaBSE: Language-agnostic BERT Sentence Embedding 34 https://arxiv.org/abs/2007.01852
  28. •طଘݚڀͱͷൺֱ •LaBSEͷߏ੒ཁૉ • Dual-encoderΞʔΩςΫνϟ • Translation ranking task • MLM

    and TLM Pre-training •ଟݴޠจຒΊࠐΈͷؔ࿈ݚڀ •࣮ݧઃఆɾֶशͷ޻෉ɾධՁख๏ •࣮ݧ݁Ռ •෼ੳ •෇Ճతͳ࣮ݧ ໨࣍ 35
  29. Dual-encoderΞʔΩςΫνϟ •܇࿅ํ๏ (Training strategy)ͷҰͭɺจຒΊࠐΈख๏ͰҰൠత •Sentence-BERT, SimCSEͳͲ΋͜ͷํࣜ Translation ranking task •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश

    •Additive margin softmaxΛ༻ֶ͍ͯशΛ޻෉ MLM and TLM Pre-training •Masked Language Modeling (MLM) •Translation Language Modeling (TLM) LaBSEͷߏ੒ཁૉ 37
  30. •2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒ • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ) • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ

    39 Encoder Decoder Encoder-Decoder Encoder Encoder Dual-Encoder ଛࣦܭࢉ EncDecͷλεΫ • લޙจੜ੒ • ຋༁จੜ੒ • Denoising AE ॏΈڞ༗ Dual-EncoderͷλεΫ • ؚҙؔ܎ೝࣝ • ରরֶश
  31. •GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+:

    E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 40 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ਖ਼ྫ ෛྫ
  32. •GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+:

    E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 41 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ
  33. •GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+:

    E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 42 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ ྨࣅ౓࠷େԽ ྨࣅ౓࠷খԽ
  34. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ LaBSEͷߏ੒ཁૉ: Translation ranking task 43

    ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  35. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ LaBSEͷߏ੒ཁૉ: Translation

    ranking task 44 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ ྨࣅ౓͸0.98… 0.24…
  36. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ

    • 1ରNΛNճ܁Γฦ͢Πϝʔδ LaBSEͷߏ੒ཁૉ: Translation ranking task 45 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ౓͸0.98…
  37. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ

    • 1ରNΛNճ܁Γฦ͢Πϝʔδ •ଛࣦؔ਺͸ˠ • ͸ຒΊࠐΈͷ಺ੵ ϕ LaBSEͷߏ੒ཁૉ: Translation ranking task 46 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ౓͸0.98…
  38. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) LaBSEͷߏ੒ཁૉ: Translation ranking task 47 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  39. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) LaBSEͷߏ੒ཁૉ: Translation ranking task 48 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  40. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏ੒ཁૉ: Translation ranking task 49 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  41. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏ੒ཁૉ: Translation ranking task 50 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  42. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ • ղফ͢ΔͨΊɺ2ํ޲(→↓)ͷଛࣦΛ଍͠߹ΘͤΔ LaBSEͷߏ੒ཁૉ: Translation ranking task 51 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  43. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ ϕ ϕ′  [27]

    Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 52
  44. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ •มߋޙͷଛࣦؔ਺͸͜͏↓ ϕ ϕ′ 

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 53
  45. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ •มߋޙͷଛࣦؔ਺͸͜͏↓ ϕ ϕ′ 

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 54
  46. Translation Language Modeling (TLM) [28] •຋༁จϖΞΛ࿈݁ͯ͠MLM • ೋݴޠؒͷରԠ (alignment) ͷֶशΛظ଴

    [28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19 LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 56 զഐ ͸ [MASK] Ͱ͋Δ ɻ [/s] [/s] Transformer ೣ [MASK] am a [/s] I cat
  47. •TLM͸MLMͷ֦ு • ଟগͷมߋͰMLMͱಉ͡Α͏ʹֶश͕Ͱ͖Δ •LaBSEͰ͸MLMͱTLM
 Λ૊Έ߹Θֶͤͯश [28] Conneau+: Cross-lingual Language Model

    Pretraining, NeurIPS ‘19 LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 57 ͜ͷݚڀͰ͸ଟݴޠೳྗ (multilinguality)Λ޲্ͤ͞ΔͨΊ
 Language embeddings
 Λ࢖͍ͬͯͳ͍
  48. Monolingual Data •CommonCrawl, Wikipedia͔Βऩूɺ17B (170ԯ) sentences •લॲཧࡁΈɺࣄલֶश(MLM)ʹͷΈར༻ Bilingual Translation Pairs

    •Webϖʔδͷ຋༁จϚΠχϯά(bitext mining)Ͱσʔλऩूɺ6B (60ԯ) pairs •σʔλෆۉߧରࡦͱͯ͠ɺ֤ݴޠͷจ਺͕100MҎԼʹͳΔΑ͏੍ݶ •αϒηοτ΁ͷਓखධՁΛ༻͍ͨ௿඼࣭σʔλͷϑΟϧλϦϯά •ࣄલֶश(MLM & TLM)ͱdual-encoderͷ܇࿅ʹར༻ ֶशσʔλ 60
  49. United Nations (UN) •ӳޠ͔ΒରԠ͢ΔผݴޠͷจॻΛݕࡧ (Precision@1ʹaccuracy) •en-fr, en-es, en-ru, en-ar, en-zhͷ5ݴޠରɺ86,000จ

    Tatoeba •ӳޠҎ֎ͷݴޠ͔ΒରԠ͢Δӳ༁Λݕࡧ (Average accuracy) •https://tatoeba.org ͔ΒྫจͱͦΕʹඥͮ͘ର༁Λऩूͨ͠ίʔύε •112ݴޠɺ֤ݴޠʹ͖ͭ1000จͱରԠ͢Δӳ༁͕ଘࡏ •طଘݚڀʹ฿͍ɺ36ݴޠͷΈʹߜͬͨαϒηοτͰͷධՁ΋࣮ࢪ BUCC •୯ݴޠίʔύε͔Β຋༁จϖΞΛݟ͚ͭΔ (Precision, Recall, F1) •fr-en, de-en, ru-en, zh-enͷ4ݴޠର ධՁλεΫ: bitext retrieval 62
  50. •ޠኮαΠζ • mBERT Vocab: multilingual BERT (mBERT)ͱಉ͡(119,547) • Customized Vocab:

    ݴޠ͝ͱͷσʔλෆۉߧରࡦΛͯ͠ॳΊ͔Β࡞੒ (501,153) •ࣄલֶश (PT) • MLM+TLMʹΑΔࣄલֶशΛ΍Δ͔Ͳ͏͔ • ΍Βͳ͍৔߹͸Translation ranking taskͷΈΛߦ͏ •Additive Margin Softmax (AMS) • Translation ranking taskʹmarginΛ࢖͏͔Ͳ͏͔ ࣮ݧ৚݅ 64
  51. •จຒΊࠐΈʹ͸ [CLS] ΛL2ਖ਼نԽͯ͠ར༻ •optimizer = AdamW, learning rate = 1e-3,

    seq length = 128 Pre-training •batch size: 8192 Translation ranking task •batch size: 4096 •w/ Pre-training: 50k steps, w/o Pre-training: 500k steps •margin value: 0.3 ࣮ݧઃఆ 65
  52. •LaBSE (Customized Vocab + AMS + PT) ʹSOTAߋ৽ • Yang

    et al. ͸bilingual modelͰ͋Γɺ֤ݴޠ͝ͱʹϞσϧ͕ඞཁ • LaBSE͸ҰͭͷϞσϧͰ109Ҏ্ͷݴޠʹରԠՄೳ ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba 66 ͳ͔ͥBase w/ Customized Vocabͷ݁Ռ͕ଘࡏ͠ͳ͍
  53. •PTʹΑͬͯશମతͳੑೳ޲্ •PT͋ΓͷϞσϧ͸50K stepsͷ
 ܇࿅Ͱطʹੑೳ͕ऩଋ • 50K steps = 200M examples

    • ର༁σʔλ͕গͳ͍͍ͯ͘ •PT͸ੑೳ޲্ͱऩଋ଎౓޲্
 (=܇࿅ࣄྫ਺࡟ݮ)ʹ໾ཱͭ ෼ੳ: ࣄલֶशͷ༗༻ੑ 73
  54. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive

    Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ • ࣄલֶशʹΑͬͯର༁σʔλྔΛ࡟ݮ • AMS͕ੑೳʹେ͖ͳӨڹ •ࣄલֶशࡁΈϞσϧ͕ެ։͞Ε͍ͯΔ ·ͱΊ: Language-agnostic BERT Sentence Embedding 78 https://arxiv.org/abs/2007.01852
  55. •࠷ۙ͸PearsonͰ͸ͳ͘SpearmanͰධՁ͢Δ͜ͱ͕ଟ͍ • Pearson͸͋Μ·ΓධՁࢦඪͱͯ͠ྑ͘ͳ͍ΑͶͱ͍͏࿩͕͋Δ [31] •SpearmanͰධՁ͢ΔݶΓSTS͸ʮॱҐ෇͚λεΫʯͳ͜ͱʹ஫ҙ •࠷ۙSTS Benchmarkͷdev setͰϋΠύϥௐ੔͢Δͷ͕ྲྀߦ͍ͬͯΔ • 250

    step͝ͱʹdevͰධՁͯ͠࠷ߴͷcheckpointͰtestͰධՁ (SimCSE) • STSʹաద߹ͦ͠͏ͳͷͰ͋·Γྑ͍ํ਑ʹ͸ࢥ͑ͳ͍͕… • ʮֶशʹ͸࢖͑ͳ͍͚Ͳdevͱͯ͠͸࢖͑·͢ʂʯ͸ྑ͍ઃఆʁ •STSλεΫ͸ධՁख๏͕࿦จ͝ͱʹҟͳΔ͕࣌͋Γɺ஫ҙ͕ඞཁ • ධՁࢦඪ΍ධՁखॱ͕όϥ͍͍ͭͯΔ͜ͱ͕͋ͬͨ (࠷ۙ͸౷Ұ͞ΕͯΔ) • SimCSE࿦จ [11] ͷAppendix.Bʹهड़͕͋ΔͷͰҰಡΛਪ঑ [31] Reimers+: Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity, COLING ‘16 ಋೖ: STSʹ·ͭΘΔখൌ 81
  56. •࣮͸จͷҙຯҎ֎ΛຒΊࠐΜͰ΋͍͍ • ࢖༻୯ޠɺελΠϧ(ϑΥʔϚϧ͞ɺܟޠ)ɺ࣭໰ͱ౴͑ͷۙ͞ɺetc. •จຒΊࠐΈۭؒ͸จͷԿΛ͚ۙͮΔ͔Ͱಛ௃͚ͮΒΕΔ • ͲͷΑ͏ʹڑ཭Λఆٛ͢Δ͔͕จຒΊࠐΈͷੑ࣭ΛܾΊΔ •܇࿅Ͱ͚ۙͮΔจͱͦΕʹΑͬͯදݱ͞ΕΔ“ڑ཭”ͷରԠ (චऀͷ༧ଌ) • ؚҙؔ܎ʹ͋Δจ:

    จͷද૚తྨࣅ౓ΑΓҙຯʹ஫໨ • ࣭໰ͱճ౴: จࣗମͷҙຯΑΓ࣭໰ͱճ౴͕ද͢಺༰ʹ஫໨ • ຋༁จϖΞ: Ͳͷݴޠ͔Λແࢹͯ͠จͷҙຯʹ஫໨ ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 82
  57. [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural

    Language Inference Data, EMNLP ’17 [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP ’19 [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 [32] Hill+: Learning Distributed Representations of Sentences from Unlabelled Data, NAACL ’16 [33] Wang+:, TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning, EMNLP fi ndings ’21 [34] Li+: OPTIMUS: Organizing Sentences via Pre-trained Modeling of a Latent Space, EMNLP ’20 ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 83 จͷҙຯΛຒΊࠐΉख๏ •InferSent: dual-encoder (Siamese) ߏ଄ͰLSTMΛNLI෼ྨͰֶश [08] •Sentence-BERT: dual-encoderߏ଄ͰBERTΛNLI෼ྨͰ fi ne-tuning [10] •Supervised SimCSE: NLIͷؚҙؔ܎ͷจϖΞΛਖ਼ྫͱͨ͠ରরֶश[11] จΛ࠶ߏஙͰ͖ΔΑ͏ʹจͷ৘ใΛຒΊࠐΉख๏ •SDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠LSTMΛֶश [32] •TSDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠TransformerΛֶश [33] •Optimus: Ͱ͔͍VAE [34]
  58. [06] Kiros+: Skip-Thought Vectors, NIPS ’15 [09] Cer+: Universal Sentence

    Encoder, arXiv, Mar 2018 [35] Tsukagoshi+: DefSent: Sentence Embeddings using De fi nition Sentences, ACL ’21 [36] Wu+: DistilCSE: E ff ective Knowledge Distillation For Contrastive Sentence Embeddings, ARR ’22 [37] Wu+: DisCo: E ff ective Knowledge Distillation For Contrastive Learning of Sentence Embeddings, arXiv ’21 ([35]ͱಉ಺༰) ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 84 લޙͷจͷ৘ใΛຒΊࠐΉख๏ •Skip-Thought: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [06] •USE: Skip-Thoughtͷڭࢣͳֶ͠श + ෼ྨ໰୊ʹΑΔڭࢣ͋Γֶश [09] ఆٛจ͔Β୯ޠͷҙຯΛߏ੒͢ΔΑ͏ʹจͷҙຯΛຒΊࠐΉख๏ •DefSent: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [35] Α͘Θ͔Βͳ͍ख๏ •Unsupervised SimCSE: ҟͳΔdropoutΛద༻ͨ͠จΛਖ਼ྫʹରরֶश [11] •DistilCSE: ڭࢣͱੜెͷจຒΊࠐΈΛਖ਼ྫͱͨ͠ରরֶशʹΑΔৠཹ [36, 37]
  59. •୯ݴޠจຒΊࠐΈϞσϧ͔Βͷ஌ࣝৠཹͰଟݴޠจຒΊࠐΈϞσϧΛ܇࿅ • ຋༁ίʔύεΛ࢖ͬͯҟͳΔݴޠͷจຒΊࠐΈΛ௚઀͚ۙͮΔ •ҟݴޠؒSTSͰΑ͍ੑೳ •୯ݴޠSTSͰ΋ੑೳ͕ߴ͍ • NLIΛ࢖͍ͬͯΔͷͰ౰ͨΓલʁ •ݴޠόΠΞε͕LaBSEΑΓখ͍͞ •Teacher modelͷจຒΊࠐΈۭؒͱࣅͨߏ଄Λ࣋ͭଟݴޠจຒΊࠐΈϞσ

    ϧΛ࡞ΕΔͷ͕ྑ͍ͱ͜Ζ • ҰͭͷϞσϧͰsequentialʹ΍Δͱۭ͕ؒյΕΔ [38] Reimers+: Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, EMNLP ‘20 ؔ܎͕͋Δ࿦จ:
 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation [38] 85