Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
【輪講資料】Length-Induced Embedding Collapse in PLM-...
Search
Yano
September 22, 2025
0
130
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
ACL読み会@名大 2025で使用したスライドです
Yano
September 22, 2025
Tweet
Share
More Decks by Yano
See All by Yano
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
210
NLP2025参加報告
yano0
0
550
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
180
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
340
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
91
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
130
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
53
7.8k
The Language of Interfaces
destraynor
162
25k
4 Signs Your Business is Dying
shpigford
185
22k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
Raft: Consensus for Rubyists
vanstee
139
7.1k
Building Adaptive Systems
keathley
43
2.8k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Gamification - CAS2011
davidbonilla
81
5.5k
What's in a price? How to price your products and services
michaelherold
246
12k
For a Future-Friendly Web
brad_frost
180
9.9k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Bash Introduction
62gerente
615
210k
Transcript
Length-Induced Embedding Collapse in PLM-based Models ݚڀࣨɹD1 ઍߛ Yuqi Zhou,
Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu ACL 2025
֓ཁ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 2
ҙ • จதͷਤݩจ͔ΒͷҾ༻ࣗ͘͠࡞Ͱ͢ɻ • ࣜͷల։ɺূ໌ʹؔͯ͠ेͳઆ໌͕͋Γ·ͤΜɻ 3
ςΩετຒΊࠐΈϞσϧ ✓ จΛ୯ҰͷදݱʢϕΫτϧͳͲʣʹΤϯίʔυ͢ΔϞσϧ • ۙLLMϕʔεͷϞσϧΜʹѻΘΕ͍ͯΔ͕ɺ͜ͷจ BERTͳͲͷํΞςϯγϣϯΛ࣋ͭɺ͍ΘΏΔΤϯίʔμ ϕʔεͷϞσϧ͕ର 4 ςΩετ ຒΊࠐΈϞσϧ
͓ʹ͗Γ͕৯͍ͨ ͓ण࢘Λ৯Α͏ കͬ͢ͺ͍ ྨࣅߴ ྨࣅ ҙຯۙ ҙຯԕ ಋೖ
ςΩετຒࠐϞσϧͱܥྻ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΉ͜ͱͷͰ͖ΔςΩετຒࠐϞσ ϧͷߏஙΑ͘औΓ·Ε͍ͯΔ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΊΔͱɺ͍จষΛରͱ͢Δݕ ࡧɺ͍ରཤྺΛରͱ͢Δྨͱ͔͕Ͱ͖ͯخ͍͠ • ۙͷϞσϧʢBAAI/bge-m3ɺjinaai/jina-embeddings-v3ͳͲʣ ཧ্࠷େܥྻ8192ͳͲͱ͞Ε͍ͯͨΓ͢Δ
• 🤔͔͠͠ɺ܇࿅σʔλϕϯνϚʔΫະͩෆेͱ͍͏ҹ… 5 ಋೖ
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶃ ✓ ܥྻ/Ϟσϧ͝ͱʹྨλεΫͷ݁ՌΛཧ • ྨλεΫͷੑೳܥྻ͕͍΄ͲԼ͕Δ • 🤔 ͍ํ͕λεΫࣗମ؆୯…ʁ 6 Ϟσϧ
ܥྻʢUPLFOʣ өըϨϏϡʔྨ ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶄ ✓ ฏۉ670จࣈͷςΩετΛLLMʹΑͬͯฏۉ120จࣈɺ36.5จࣈʹ ཁ͠ɺBGEͰΤϯίʔυɺՄࢹԽ • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈʢ•ʣີू͍ͯ͠Δ 7 ҩֶจ
ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶅ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ ※τ=1ͷάϥϑ͚ͩݟ͍ͯͩ͘͞ɺτ͕ͳʹ͔ޙड़… • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈಉ࢜΄Ͳ Similarity͕ߴ͍ɺີू͍ͯ͠Δʂ
8 ಋೖ: طଘͷ՝
Length Collapse • ܥྻͷຒΊࠐΈີू͍ͯͯ͠ɺͦͷ͍ͤͰԼྲྀλεΫͷੑೳ ͕མͪΔΒ͍͠ = ͜ΕΛLength CollapseͱݺͿ • ͳͥͦΜͳࣄ͕ى͖Δͷ͔…ʁ
9 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 10 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 11 ิɿݻ༗ͱݻ༗ϕΫτϧ "ߦྻ Yݻ༗ϕΫτϧ Еݻ༗ ݻ༗ϕΫτϧɿ"Ͱม͖͕ͯ͠มΘΒͣɺЕഒ͞ΕΔϕΫτϧY Ax = λx ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 12 ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 13 ֤ཁૉ͔ΒฏۉΛҾ͍ͨͷ දݱͷσΟςΟʔϧΈ͍ͨͳΠϝʔδ… ՝ͷਂງΓ
ఆٛ2: Self AttentionʹΑΔߴपͷݮਰ 14 • ఆٛ2: ߴपHC[X]ͷSelf AttentionʹΑΔϑΟϧλʔ HC[X]ͷ࠷େಛҟσ_αͰ͑Δ͜ͱ͕Ͱ͖Δ ͦͷߦྻͰͷมΛߦͬͨͱ͖ʹ
มԽ͢Δେ͖͞ͷ࠷େ ՝ͷਂງΓ
ఆٛ 3: ߴपͷݮਰܥྻ͕͘ͳΔ΄Ͳେ͖͘ͳΔ 15 • ఆٛ3: σ_αܥྻnͰ͑Δ͜ͱ͕Ͱ͖Δ • ͓ؾ࣋ͪɿܥྻ͕͘ͳΔ΄ͲsoftmaxʹΑͬͯAttention Score
ͷ͕ฏୱʹͳΓɺߴप͕θϩߦྻʹۙͮ͘ = ಛҟ͕ খ͘͞ͳΔ ͭ·ΓɺϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ ՝ͷਂງΓ
ϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ = ܥྻ͕͍΄Ͳߴप͕ݮਰ͠ɺࣅͨຒΊࠐΈ͔Γʹͳ ΔʢLength Collapseʣ • ࠷ॳʹ͔֬ΊͨCosineྨࣅͱ ܥྻͷؔʹઆ໌͕ͭ͘ Length Collapseى͖͍ͯͦ͏…
16 ՝ͷਂງΓ
ఏҊɿTempScaleͷಋೖ • Length CollapseߴपͷݮਰʢաฏԽʣʹΑͬͯى͖͍ͯͨ ➡ݩͷग़ྗͷΛͬͱઑΒͤΕΑ͍ͷͰʁ • softmaxલͷlogitͷ֤ߦΛআࢉ͢ΔɺԹ0< τ< 1ΛɺAttentionʹಋೖ •
ग़ྗτ͕େ͖͍΄Ͳฏୱʹɺখ͍͞΄ͲઑͬͨʹͳΔ • ͭ·Γɺτ͕খ͍͞΄ͲΑΓଟ͘ͷߴप͕อ࣋͞ΕΔʂ 17 ղܾ๏ͷఏҊ
TempScaleͷద༻ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ • ར༻σʔλNFCorpus • Թ͕Լ͕ΔͱɺܥྻʹΑΔ ྨࣅͷӨڹখ͘͞ͳ͍ͬͯͦ͏ʂ 18
ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 19 ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 20 ࠩਖ਼ͷ͕ͩ ݁ߏখ্͍͞෯… ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 21 ܥྻେ͖ͳϞσϧ λεΫͱ૬ੑྑ…ʁ ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫʢSTSʣͷӨڹ • STSɿจϖΞͷҙຯͷۙ͞Λ͏·͘ଊ͑ΔλεΫ • ԹΛখ͘͢͞Δ΄Ͳɺؔͳ͍ʢRandomUnrekatedʣϖΞ ͷྨࣅ͕Լ͕Δ • 🤔ͦ͏͔ͳ…ʁ 22 ղܾ๏ͷద༻
ରςΩετͷܥྻʹΑͬͯ࠷దͳԹҟͳΔ • ײతʹɺ͍ܥྻΛѻ͏߹΄Ͳݮਰ͋ͬͯ͘΄͍͠ = Թখ͋ͬͯ͘͞΄͍͠ • λεΫɿSummScreenFD • ͍͍ͩͨظ௨Γ͕ͩɺANCEͷΈظͱٯͷ 23
ςϨϏͷຊ͔Βͦͷ ཁΛݕࡧ͢ΔλεΫ ˛ϞσϧɺςΩετͷ͞͝ͱͷԹͱੑೳͷؔ ղܾ๏ͷద༻
·ͱΊ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 24
ײ • ཧతʹ࣮ݧతʹॆ࣮ͨ͠ίετͷֻ͔͍ͬͯΔจͩͳ͋ͱݴ ͏ҹ • Appendixʹେྔͷ࣮ݧɺ֤ධՁσʔλͷઆ໌͕͋ͬͨΓɺLLMϕʔε ͷຒΊࠐΈϞσϧʹ͍ͭͯݴٴ͕͋Δ • ICLRͰreject͞Εͨόʔδϣϯ͔Βཧ໘ॆ࣮͍ͤͯͯ͞Ғ͍ •
ͦΕͦ͏ͱɺTempScale͏·͘ߦ͍ͬͯΔ͔Θ͔Βͳ͍… • ϞσϧλεΫʹΑ͕͔ͬͯͳΓΒ͍͍ͭͯΔ • TempScaleΛܥྻʹґଘܾͯ͠ఆ͠ɺ܇࿅ͯ͠ΈΔͱͦΕͳΓʹੑೳ ্͕͕Δ͔…ʁ 25