[04], uSIF [05], etc. •จຒΊࠐΈઐ༻ͷϞσϧΛߏங • Skip-Thought [06], SCDV [07], InferSent [08], USE [09], etc. [01] Ru ̈ckle ́+: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations, arXiv ’18 [02] Shen+: Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, ACL ’18 [03] Zhelezniak+: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR ’19 [04] Arora+: A Simple but Tough-to-Beat Baseline for Sentence Embeddings, ICLR '17 [05] Ethayarajh: Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline, Rep4NLP ’18 [06] Kiros+: Skip-Thought Vectors, NIPS ’15 [07] Mekala+: SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations, ACL ’17 [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP '17 [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018 ಋೖ: จຒΊࠐΈͷදతͳख๏ 9
(STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅͷ૬ؔ •SentEval: ςΩετྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 12
(STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅͷ૬ؔ •SentEval: ςΩετྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 13
(STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅͷ૬ؔ •SentEval: ςΩετྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 14 STSͱSentEval͕࠷Α͘ΘΕΔ
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 จϖΞͷྨࣅॱҐ
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0)
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) ਖ਼ղͷॱҐͱ༧ଌʹΑΔ ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 ਖ਼ղͷॱҐͱ༧ଌʹΑΔ ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)
A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 Model 2ͷ΄͏͕༏Ε͍ͯΔ
E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏཁૉ: Translation ranking task 40 զഐೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ਖ਼ྫ ෛྫ
E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏཁૉ: Translation ranking task 41 զഐೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ
E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏཁૉ: Translation ranking task 42 զഐೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ ྨࣅ࠷େԽ ྨࣅ࠷খԽ
ranking task 44 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ ྨࣅ0.98… 0.24…
• 1ରNΛNճ܁Γฦ͢Πϝʔδ LaBSEͷߏཁૉ: Translation ranking task 45 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ0.98…
• 1ରNΛNճ܁Γฦ͢Πϝʔδ •ଛࣦؔˠ • ຒΊࠐΈͷੵ ϕ LaBSEͷߏཁૉ: Translation ranking task 46 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ0.98…
• (ෛྫΛߋʹ૿͢͜ͱՄೳ) LaBSEͷߏཁૉ: Translation ranking task 47 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ
• (ෛྫΛߋʹ૿͢͜ͱՄೳ) LaBSEͷߏཁૉ: Translation ranking task 48 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ
• (ෛྫΛߋʹ૿͢͜ͱՄೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏཁૉ: Translation ranking task 49 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ
• (ෛྫΛߋʹ૿͢͜ͱՄೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏཁૉ: Translation ranking task 50 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ
• (ෛྫΛߋʹ૿͢͜ͱՄೳ) •softmaxͰଛࣦ͕ඇରশੑʹ • ղফ͢ΔͨΊɺ2ํ(→↓)ͷଛࣦΛ͠߹ΘͤΔ LaBSEͷߏཁૉ: Translation ranking task 51 ࢲϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐೣͰ͋Δɻ ͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲᘳͳਓؒͰ͢ɻ
[28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19 LaBSEͷߏཁૉ: MLM and TLM Pre-training 56 զഐ [MASK] Ͱ͋Δ ɻ [/s] [/s] Transformer ೣ [MASK] am a [/s] I cat