Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
【輪講資料】Length-Induced Embedding Collapse in PLM-...
Search
Yano
September 22, 2025
0
140
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
ACL読み会@名大 2025で使用したスライドです
Yano
September 22, 2025
Tweet
Share
More Decks by Yano
See All by Yano
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
240
NLP2025参加報告
yano0
0
580
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
190
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
340
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
96
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
140
Featured
See All Featured
Fireside Chat
paigeccino
41
3.7k
How STYLIGHT went responsive
nonsquared
100
5.9k
How to Ace a Technical Interview
jacobian
280
24k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
A designer walks into a library…
pauljervisheath
210
24k
Writing Fast Ruby
sferik
630
62k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
670
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Transcript
Length-Induced Embedding Collapse in PLM-based Models ݚڀࣨɹD1 ઍߛ Yuqi Zhou,
Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu ACL 2025
֓ཁ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 2
ҙ • จதͷਤݩจ͔ΒͷҾ༻ࣗ͘͠࡞Ͱ͢ɻ • ࣜͷల։ɺূ໌ʹؔͯ͠ेͳઆ໌͕͋Γ·ͤΜɻ 3
ςΩετຒΊࠐΈϞσϧ ✓ จΛ୯ҰͷදݱʢϕΫτϧͳͲʣʹΤϯίʔυ͢ΔϞσϧ • ۙLLMϕʔεͷϞσϧΜʹѻΘΕ͍ͯΔ͕ɺ͜ͷจ BERTͳͲͷํΞςϯγϣϯΛ࣋ͭɺ͍ΘΏΔΤϯίʔμ ϕʔεͷϞσϧ͕ର 4 ςΩετ ຒΊࠐΈϞσϧ
͓ʹ͗Γ͕৯͍ͨ ͓ण࢘Λ৯Α͏ കͬ͢ͺ͍ ྨࣅߴ ྨࣅ ҙຯۙ ҙຯԕ ಋೖ
ςΩετຒࠐϞσϧͱܥྻ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΉ͜ͱͷͰ͖ΔςΩετຒࠐϞσ ϧͷߏஙΑ͘औΓ·Ε͍ͯΔ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΊΔͱɺ͍จষΛରͱ͢Δݕ ࡧɺ͍ରཤྺΛରͱ͢Δྨͱ͔͕Ͱ͖ͯخ͍͠ • ۙͷϞσϧʢBAAI/bge-m3ɺjinaai/jina-embeddings-v3ͳͲʣ ཧ্࠷େܥྻ8192ͳͲͱ͞Ε͍ͯͨΓ͢Δ
• 🤔͔͠͠ɺ܇࿅σʔλϕϯνϚʔΫະͩෆेͱ͍͏ҹ… 5 ಋೖ
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶃ ✓ ܥྻ/Ϟσϧ͝ͱʹྨλεΫͷ݁ՌΛཧ • ྨλεΫͷੑೳܥྻ͕͍΄ͲԼ͕Δ • 🤔 ͍ํ͕λεΫࣗମ؆୯…ʁ 6 Ϟσϧ
ܥྻʢUPLFOʣ өըϨϏϡʔྨ ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶄ ✓ ฏۉ670จࣈͷςΩετΛLLMʹΑͬͯฏۉ120จࣈɺ36.5จࣈʹ ཁ͠ɺBGEͰΤϯίʔυɺՄࢹԽ • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈʢ•ʣີू͍ͯ͠Δ 7 ҩֶจ
ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶅ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ ※τ=1ͷάϥϑ͚ͩݟ͍ͯͩ͘͞ɺτ͕ͳʹ͔ޙड़… • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈಉ࢜΄Ͳ Similarity͕ߴ͍ɺີू͍ͯ͠Δʂ
8 ಋೖ: طଘͷ՝
Length Collapse • ܥྻͷຒΊࠐΈີू͍ͯͯ͠ɺͦͷ͍ͤͰԼྲྀλεΫͷੑೳ ͕མͪΔΒ͍͠ = ͜ΕΛLength CollapseͱݺͿ • ͳͥͦΜͳࣄ͕ى͖Δͷ͔…ʁ
9 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 10 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 11 ิɿݻ༗ͱݻ༗ϕΫτϧ "ߦྻ Yݻ༗ϕΫτϧ Еݻ༗ ݻ༗ϕΫτϧɿ"Ͱม͖͕ͯ͠มΘΒͣɺЕഒ͞ΕΔϕΫτϧY Ax = λx ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 12 ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 13 ֤ཁૉ͔ΒฏۉΛҾ͍ͨͷ දݱͷσΟςΟʔϧΈ͍ͨͳΠϝʔδ… ՝ͷਂງΓ
ఆٛ2: Self AttentionʹΑΔߴपͷݮਰ 14 • ఆٛ2: ߴपHC[X]ͷSelf AttentionʹΑΔϑΟϧλʔ HC[X]ͷ࠷େಛҟσ_αͰ͑Δ͜ͱ͕Ͱ͖Δ ͦͷߦྻͰͷมΛߦͬͨͱ͖ʹ
มԽ͢Δେ͖͞ͷ࠷େ ՝ͷਂງΓ
ఆٛ 3: ߴपͷݮਰܥྻ͕͘ͳΔ΄Ͳେ͖͘ͳΔ 15 • ఆٛ3: σ_αܥྻnͰ͑Δ͜ͱ͕Ͱ͖Δ • ͓ؾ࣋ͪɿܥྻ͕͘ͳΔ΄ͲsoftmaxʹΑͬͯAttention Score
ͷ͕ฏୱʹͳΓɺߴप͕θϩߦྻʹۙͮ͘ = ಛҟ͕ খ͘͞ͳΔ ͭ·ΓɺϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ ՝ͷਂງΓ
ϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ = ܥྻ͕͍΄Ͳߴप͕ݮਰ͠ɺࣅͨຒΊࠐΈ͔Γʹͳ ΔʢLength Collapseʣ • ࠷ॳʹ͔֬ΊͨCosineྨࣅͱ ܥྻͷؔʹઆ໌͕ͭ͘ Length Collapseى͖͍ͯͦ͏…
16 ՝ͷਂງΓ
ఏҊɿTempScaleͷಋೖ • Length CollapseߴपͷݮਰʢաฏԽʣʹΑͬͯى͖͍ͯͨ ➡ݩͷग़ྗͷΛͬͱઑΒͤΕΑ͍ͷͰʁ • softmaxલͷlogitͷ֤ߦΛআࢉ͢ΔɺԹ0< τ< 1ΛɺAttentionʹಋೖ •
ग़ྗτ͕େ͖͍΄Ͳฏୱʹɺখ͍͞΄ͲઑͬͨʹͳΔ • ͭ·Γɺτ͕খ͍͞΄ͲΑΓଟ͘ͷߴप͕อ࣋͞ΕΔʂ 17 ղܾ๏ͷఏҊ
TempScaleͷద༻ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ • ར༻σʔλNFCorpus • Թ͕Լ͕ΔͱɺܥྻʹΑΔ ྨࣅͷӨڹখ͘͞ͳ͍ͬͯͦ͏ʂ 18
ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 19 ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 20 ࠩਖ਼ͷ͕ͩ ݁ߏখ্͍͞෯… ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 21 ܥྻେ͖ͳϞσϧ λεΫͱ૬ੑྑ…ʁ ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫʢSTSʣͷӨڹ • STSɿจϖΞͷҙຯͷۙ͞Λ͏·͘ଊ͑ΔλεΫ • ԹΛখ͘͢͞Δ΄Ͳɺؔͳ͍ʢRandomUnrekatedʣϖΞ ͷྨࣅ͕Լ͕Δ • 🤔ͦ͏͔ͳ…ʁ 22 ղܾ๏ͷద༻
ରςΩετͷܥྻʹΑͬͯ࠷దͳԹҟͳΔ • ײతʹɺ͍ܥྻΛѻ͏߹΄Ͳݮਰ͋ͬͯ͘΄͍͠ = Թখ͋ͬͯ͘͞΄͍͠ • λεΫɿSummScreenFD • ͍͍ͩͨظ௨Γ͕ͩɺANCEͷΈظͱٯͷ 23
ςϨϏͷຊ͔Βͦͷ ཁΛݕࡧ͢ΔλεΫ ˛ϞσϧɺςΩετͷ͞͝ͱͷԹͱੑೳͷؔ ղܾ๏ͷద༻
·ͱΊ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 24
ײ • ཧతʹ࣮ݧతʹॆ࣮ͨ͠ίετͷֻ͔͍ͬͯΔจͩͳ͋ͱݴ ͏ҹ • Appendixʹେྔͷ࣮ݧɺ֤ධՁσʔλͷઆ໌͕͋ͬͨΓɺLLMϕʔε ͷຒΊࠐΈϞσϧʹ͍ͭͯݴٴ͕͋Δ • ICLRͰreject͞Εͨόʔδϣϯ͔Βཧ໘ॆ࣮͍ͤͯͯ͞Ғ͍ •
ͦΕͦ͏ͱɺTempScale͏·͘ߦ͍ͬͯΔ͔Θ͔Βͳ͍… • ϞσϧλεΫʹΑ͕͔ͬͯͳΓΒ͍͍ͭͯΔ • TempScaleΛܥྻʹґଘܾͯ͠ఆ͠ɺ܇࿅ͯ͠ΈΔͱͦΕͳΓʹੑೳ ্͕͕Δ͔…ʁ 25