Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
【輪講資料】Length-Induced Embedding Collapse in PLM-...
Search
Yano
September 22, 2025
0
140
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
ACL読み会@名大 2025で使用したスライドです
Yano
September 22, 2025
Tweet
Share
More Decks by Yano
See All by Yano
【輪講資料】Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
yano0
0
58
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
240
NLP2025参加報告
yano0
0
580
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
210
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
350
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
97
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
140
Featured
See All Featured
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
1
92
Build your cross-platform service in a week with App Engine
jlugia
234
18k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
The Invisible Side of Design
smashingmag
302
51k
Navigating Team Friction
lara
191
16k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
The World Runs on Bad Software
bkeepers
PRO
72
12k
The Cult of Friendly URLs
andyhume
79
6.7k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.6k
Transcript
Length-Induced Embedding Collapse in PLM-based Models ݚڀࣨɹD1 ઍߛ Yuqi Zhou,
Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu ACL 2025
֓ཁ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 2
ҙ • จதͷਤݩจ͔ΒͷҾ༻ࣗ͘͠࡞Ͱ͢ɻ • ࣜͷల։ɺূ໌ʹؔͯ͠ेͳઆ໌͕͋Γ·ͤΜɻ 3
ςΩετຒΊࠐΈϞσϧ ✓ จΛ୯ҰͷදݱʢϕΫτϧͳͲʣʹΤϯίʔυ͢ΔϞσϧ • ۙLLMϕʔεͷϞσϧΜʹѻΘΕ͍ͯΔ͕ɺ͜ͷจ BERTͳͲͷํΞςϯγϣϯΛ࣋ͭɺ͍ΘΏΔΤϯίʔμ ϕʔεͷϞσϧ͕ର 4 ςΩετ ຒΊࠐΈϞσϧ
͓ʹ͗Γ͕৯͍ͨ ͓ण࢘Λ৯Α͏ കͬ͢ͺ͍ ྨࣅߴ ྨࣅ ҙຯۙ ҙຯԕ ಋೖ
ςΩετຒࠐϞσϧͱܥྻ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΉ͜ͱͷͰ͖ΔςΩετຒࠐϞσ ϧͷߏஙΑ͘औΓ·Ε͍ͯΔ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΊΔͱɺ͍จষΛରͱ͢Δݕ ࡧɺ͍ରཤྺΛରͱ͢Δྨͱ͔͕Ͱ͖ͯخ͍͠ • ۙͷϞσϧʢBAAI/bge-m3ɺjinaai/jina-embeddings-v3ͳͲʣ ཧ্࠷େܥྻ8192ͳͲͱ͞Ε͍ͯͨΓ͢Δ
• 🤔͔͠͠ɺ܇࿅σʔλϕϯνϚʔΫະͩෆेͱ͍͏ҹ… 5 ಋೖ
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶃ ✓ ܥྻ/Ϟσϧ͝ͱʹྨλεΫͷ݁ՌΛཧ • ྨλεΫͷੑೳܥྻ͕͍΄ͲԼ͕Δ • 🤔 ͍ํ͕λεΫࣗମ؆୯…ʁ 6 Ϟσϧ
ܥྻʢUPLFOʣ өըϨϏϡʔྨ ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶄ ✓ ฏۉ670จࣈͷςΩετΛLLMʹΑͬͯฏۉ120จࣈɺ36.5จࣈʹ ཁ͠ɺBGEͰΤϯίʔυɺՄࢹԽ • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈʢ•ʣີू͍ͯ͠Δ 7 ҩֶจ
ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶅ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ ※τ=1ͷάϥϑ͚ͩݟ͍ͯͩ͘͞ɺτ͕ͳʹ͔ޙड़… • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈಉ࢜΄Ͳ Similarity͕ߴ͍ɺີू͍ͯ͠Δʂ
8 ಋೖ: طଘͷ՝
Length Collapse • ܥྻͷຒΊࠐΈີू͍ͯͯ͠ɺͦͷ͍ͤͰԼྲྀλεΫͷੑೳ ͕མͪΔΒ͍͠ = ͜ΕΛLength CollapseͱݺͿ • ͳͥͦΜͳࣄ͕ى͖Δͷ͔…ʁ
9 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 10 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 11 ิɿݻ༗ͱݻ༗ϕΫτϧ "ߦྻ Yݻ༗ϕΫτϧ Еݻ༗ ݻ༗ϕΫτϧɿ"Ͱม͖͕ͯ͠มΘΒͣɺЕഒ͞ΕΔϕΫτϧY Ax = λx ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 12 ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 13 ֤ཁૉ͔ΒฏۉΛҾ͍ͨͷ දݱͷσΟςΟʔϧΈ͍ͨͳΠϝʔδ… ՝ͷਂງΓ
ఆٛ2: Self AttentionʹΑΔߴपͷݮਰ 14 • ఆٛ2: ߴपHC[X]ͷSelf AttentionʹΑΔϑΟϧλʔ HC[X]ͷ࠷େಛҟσ_αͰ͑Δ͜ͱ͕Ͱ͖Δ ͦͷߦྻͰͷมΛߦͬͨͱ͖ʹ
มԽ͢Δେ͖͞ͷ࠷େ ՝ͷਂງΓ
ఆٛ 3: ߴपͷݮਰܥྻ͕͘ͳΔ΄Ͳେ͖͘ͳΔ 15 • ఆٛ3: σ_αܥྻnͰ͑Δ͜ͱ͕Ͱ͖Δ • ͓ؾ࣋ͪɿܥྻ͕͘ͳΔ΄ͲsoftmaxʹΑͬͯAttention Score
ͷ͕ฏୱʹͳΓɺߴप͕θϩߦྻʹۙͮ͘ = ಛҟ͕ খ͘͞ͳΔ ͭ·ΓɺϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ ՝ͷਂງΓ
ϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ = ܥྻ͕͍΄Ͳߴप͕ݮਰ͠ɺࣅͨຒΊࠐΈ͔Γʹͳ ΔʢLength Collapseʣ • ࠷ॳʹ͔֬ΊͨCosineྨࣅͱ ܥྻͷؔʹઆ໌͕ͭ͘ Length Collapseى͖͍ͯͦ͏…
16 ՝ͷਂງΓ
ఏҊɿTempScaleͷಋೖ • Length CollapseߴपͷݮਰʢաฏԽʣʹΑͬͯى͖͍ͯͨ ➡ݩͷग़ྗͷΛͬͱઑΒͤΕΑ͍ͷͰʁ • softmaxલͷlogitͷ֤ߦΛআࢉ͢ΔɺԹ0< τ< 1ΛɺAttentionʹಋೖ •
ग़ྗτ͕େ͖͍΄Ͳฏୱʹɺখ͍͞΄ͲઑͬͨʹͳΔ • ͭ·Γɺτ͕খ͍͞΄ͲΑΓଟ͘ͷߴप͕อ࣋͞ΕΔʂ 17 ղܾ๏ͷఏҊ
TempScaleͷద༻ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ • ར༻σʔλNFCorpus • Թ͕Լ͕ΔͱɺܥྻʹΑΔ ྨࣅͷӨڹখ͘͞ͳ͍ͬͯͦ͏ʂ 18
ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 19 ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 20 ࠩਖ਼ͷ͕ͩ ݁ߏখ্͍͞෯… ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 21 ܥྻେ͖ͳϞσϧ λεΫͱ૬ੑྑ…ʁ ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫʢSTSʣͷӨڹ • STSɿจϖΞͷҙຯͷۙ͞Λ͏·͘ଊ͑ΔλεΫ • ԹΛখ͘͢͞Δ΄Ͳɺؔͳ͍ʢRandomUnrekatedʣϖΞ ͷྨࣅ͕Լ͕Δ • 🤔ͦ͏͔ͳ…ʁ 22 ղܾ๏ͷద༻
ରςΩετͷܥྻʹΑͬͯ࠷దͳԹҟͳΔ • ײతʹɺ͍ܥྻΛѻ͏߹΄Ͳݮਰ͋ͬͯ͘΄͍͠ = Թখ͋ͬͯ͘͞΄͍͠ • λεΫɿSummScreenFD • ͍͍ͩͨظ௨Γ͕ͩɺANCEͷΈظͱٯͷ 23
ςϨϏͷຊ͔Βͦͷ ཁΛݕࡧ͢ΔλεΫ ˛ϞσϧɺςΩετͷ͞͝ͱͷԹͱੑೳͷؔ ղܾ๏ͷద༻
·ͱΊ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 24
ײ • ཧతʹ࣮ݧతʹॆ࣮ͨ͠ίετͷֻ͔͍ͬͯΔจͩͳ͋ͱݴ ͏ҹ • Appendixʹେྔͷ࣮ݧɺ֤ධՁσʔλͷઆ໌͕͋ͬͨΓɺLLMϕʔε ͷຒΊࠐΈϞσϧʹ͍ͭͯݴٴ͕͋Δ • ICLRͰreject͞Εͨόʔδϣϯ͔Βཧ໘ॆ࣮͍ͤͯͯ͞Ғ͍ •
ͦΕͦ͏ͱɺTempScale͏·͘ߦ͍ͬͯΔ͔Θ͔Βͳ͍… • ϞσϧλεΫʹΑ͕͔ͬͯͳΓΒ͍͍ͭͯΔ • TempScaleΛܥྻʹґଘܾͯ͠ఆ͠ɺ܇࿅ͯ͠ΈΔͱͦΕͳΓʹੑೳ ্͕͕Δ͔…ʁ 25