Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
【輪講資料】Length-Induced Embedding Collapse in PLM-...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Yano
September 22, 2025
170
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
【輪講資料】Length-Induced Embedding Collapse in PLM-based Models
ACL読み会@名大 2025で使用したスライドです
Yano
September 22, 2025
More Decks by Yano
See All by Yano
【輪講資料】Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
yano0
0
120
【輪講資料】How Do Large Language Models Acquire Factual Knowledge During Pretraining?
yano0
0
300
NLP2025参加報告
yano0
0
700
【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models
yano0
0
270
【輪講資料】SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
yano0
2
390
【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers
yano0
0
140
【輪講資料】Zero-shot Cross-lingual Semantic Parsing
yano0
0
180
Featured
See All Featured
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
390
Why Our Code Smells
bkeepers
PRO
340
58k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Chasing Engaging Ingredients in Design
codingconduct
0
220
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
56k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
140
Documentation Writing (for coders)
carmenintech
77
5.4k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
A designer walks into a library…
pauljervisheath
211
24k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
290
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
530
Transcript
Length-Induced Embedding Collapse in PLM-based Models ݚڀࣨɹD1 ઍߛ Yuqi Zhou,
Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu ACL 2025
֓ཁ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 2
ҙ • จதͷਤݩจ͔ΒͷҾ༻ࣗ͘͠࡞Ͱ͢ɻ • ࣜͷల։ɺূ໌ʹؔͯ͠ेͳઆ໌͕͋Γ·ͤΜɻ 3
ςΩετຒΊࠐΈϞσϧ ✓ จΛ୯ҰͷදݱʢϕΫτϧͳͲʣʹΤϯίʔυ͢ΔϞσϧ • ۙLLMϕʔεͷϞσϧΜʹѻΘΕ͍ͯΔ͕ɺ͜ͷจ BERTͳͲͷํΞςϯγϣϯΛ࣋ͭɺ͍ΘΏΔΤϯίʔμ ϕʔεͷϞσϧ͕ର 4 ςΩετ ຒΊࠐΈϞσϧ
͓ʹ͗Γ͕৯͍ͨ ͓ण࢘Λ৯Α͏ കͬ͢ͺ͍ ྨࣅߴ ྨࣅ ҙຯۙ ҙຯԕ ಋೖ
ςΩετຒࠐϞσϧͱܥྻ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΉ͜ͱͷͰ͖ΔςΩετຒࠐϞσ ϧͷߏஙΑ͘औΓ·Ε͍ͯΔ • ͍ܥྻΛ୯ҰͷදݱʹຒΊࠐΊΔͱɺ͍จষΛରͱ͢Δݕ ࡧɺ͍ରཤྺΛରͱ͢Δྨͱ͔͕Ͱ͖ͯخ͍͠ • ۙͷϞσϧʢBAAI/bge-m3ɺjinaai/jina-embeddings-v3ͳͲʣ ཧ্࠷େܥྻ8192ͳͲͱ͞Ε͍ͯͨΓ͢Δ
• 🤔͔͠͠ɺ܇࿅σʔλϕϯνϚʔΫະͩෆेͱ͍͏ҹ… 5 ಋೖ
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶃ ✓ ܥྻ/Ϟσϧ͝ͱʹྨλεΫͷ݁ՌΛཧ • ྨλεΫͷੑೳܥྻ͕͍΄ͲԼ͕Δ • 🤔 ͍ํ͕λεΫࣗମ؆୯…ʁ 6 Ϟσϧ
ܥྻʢUPLFOʣ өըϨϏϡʔྨ ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶄ ✓ ฏۉ670จࣈͷςΩετΛLLMʹΑͬͯฏۉ120จࣈɺ36.5จࣈʹ ཁ͠ɺBGEͰΤϯίʔυɺՄࢹԽ • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈʢ•ʣີू͍ͯ͠Δ 7 ҩֶจ
ಋೖ: طଘͷ՝
ςΩετຒࠐϞσϧܥྻʹऑ͍ᶅ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ ※τ=1ͷάϥϑ͚ͩݟ͍ͯͩ͘͞ɺτ͕ͳʹ͔ޙड़… • ར༻σʔλNFCorpus • ܥྻͷຒΊࠐΈಉ࢜΄Ͳ Similarity͕ߴ͍ɺີू͍ͯ͠Δʂ
8 ಋೖ: طଘͷ՝
Length Collapse • ܥྻͷຒΊࠐΈີू͍ͯͯ͠ɺͦͷ͍ͤͰԼྲྀλεΫͷੑೳ ͕མͪΔΒ͍͠ = ͜ΕΛLength CollapseͱݺͿ • ͳͥͦΜͳࣄ͕ى͖Δͷ͔…ʁ
9 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 10 ՝ͷਂງΓ
Length Collapseʹ͍ͭͯͷཧతͳੳɿ४උ • ࠓճѻ͏TransformerΤϯίʔμʹࣗݾҙػߏʢSelf Attention, SAʣ ؚ͕·ΕɺҎԼͷࣜͰද͞ΕΔ • X: ೖྗจɺWq,
Wk, Wv: ΫΤϦɺΩʔɺόϦϡʔͷॏΈ 1. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷཁૉͯ͢ਖ਼ 2. SoftmaxΛ࠷ޙʹ௨ΔͷͰɺߦྻͷ֤ߦͷʢ=֬ͷʣඞͣ1 ➡ 2ΑΓɺग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] 11 ิɿݻ༗ͱݻ༗ϕΫτϧ "ߦྻ Yݻ༗ϕΫτϧ Еݻ༗ ݻ༗ϕΫτϧɿ"Ͱม͖͕ͯ͠มΘΒͣɺЕഒ͞ΕΔϕΫτϧY Ax = λx ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 12 ՝ͷਂງΓ
ิ1ɿࣗݾҙػߏߴपΛݮਰͤ͞Δ • AttentionΛ௨ͬͨग़ྗͷ࠷େݻ༗1ɺରԠ͢Δݻ༗ϕΫτϧ [1,1,1,…1] • Կಉ͡ߦྻΛ͔͚Δʢ͖ʣͱɺ࠷େݻ༗ʹରԠ͢Δݻ ༗ϕΫτϧͷํɺ[1,1,1,…1] ͲΜͲΜ͍͍͖ۙͮͯɺߴप ͕ࣦΘΕΔ
• ͜ΕࣗମઌߦݚڀͰূ໌͞Ε͓ͯΓɺ͕ਂ͘ͳΔࣄʹߴप ͕ࣦΘΕΔ͜ͱʢOver-Smoothingʣ͕ΒΕ͍ͯΔ 13 ֤ཁૉ͔ΒฏۉΛҾ͍ͨͷ දݱͷσΟςΟʔϧΈ͍ͨͳΠϝʔδ… ՝ͷਂງΓ
ఆٛ2: Self AttentionʹΑΔߴपͷݮਰ 14 • ఆٛ2: ߴपHC[X]ͷSelf AttentionʹΑΔϑΟϧλʔ HC[X]ͷ࠷େಛҟσ_αͰ͑Δ͜ͱ͕Ͱ͖Δ ͦͷߦྻͰͷมΛߦͬͨͱ͖ʹ
มԽ͢Δେ͖͞ͷ࠷େ ՝ͷਂງΓ
ఆٛ 3: ߴपͷݮਰܥྻ͕͘ͳΔ΄Ͳେ͖͘ͳΔ 15 • ఆٛ3: σ_αܥྻnͰ͑Δ͜ͱ͕Ͱ͖Δ • ͓ؾ࣋ͪɿܥྻ͕͘ͳΔ΄ͲsoftmaxʹΑͬͯAttention Score
ͷ͕ฏୱʹͳΓɺߴप͕θϩߦྻʹۙͮ͘ = ಛҟ͕ খ͘͞ͳΔ ͭ·ΓɺϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ ՝ͷਂງΓ
ϑΟϧλʔܥྻͰ͑Δ͜ͱ͕Ͱ͖Δ = ܥྻ͕͍΄Ͳߴप͕ݮਰ͠ɺࣅͨຒΊࠐΈ͔Γʹͳ ΔʢLength Collapseʣ • ࠷ॳʹ͔֬ΊͨCosineྨࣅͱ ܥྻͷؔʹઆ໌͕ͭ͘ Length Collapseى͖͍ͯͦ͏…
16 ՝ͷਂງΓ
ఏҊɿTempScaleͷಋೖ • Length CollapseߴपͷݮਰʢաฏԽʣʹΑͬͯى͖͍ͯͨ ➡ݩͷग़ྗͷΛͬͱઑΒͤΕΑ͍ͷͰʁ • softmaxલͷlogitͷ֤ߦΛআࢉ͢ΔɺԹ0< τ< 1ΛɺAttentionʹಋೖ •
ग़ྗτ͕େ͖͍΄Ͳฏୱʹɺখ͍͞΄ͲઑͬͨʹͳΔ • ͭ·Γɺτ͕খ͍͞΄ͲΑΓଟ͘ͷߴप͕อ࣋͞ΕΔʂ 17 ղܾ๏ͷఏҊ
TempScaleͷద༻ ✓ ܥྻ͝ͱʹCosine SimilarityΛ͔Δ • ར༻σʔλNFCorpus • Թ͕Լ͕ΔͱɺܥྻʹΑΔ ྨࣅͷӨڹখ͘͞ͳ͍ͬͯͦ͏ʂ 18
ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 19 ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 20 ࠩਖ਼ͷ͕ͩ ݁ߏখ্͍͞෯… ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫͷӨڹ 21 ܥྻେ͖ͳϞσϧ λεΫͱ૬ੑྑ…ʁ ʜ ղܾ๏ͷద༻
TempScaleͷԼྲྀλεΫʢSTSʣͷӨڹ • STSɿจϖΞͷҙຯͷۙ͞Λ͏·͘ଊ͑ΔλεΫ • ԹΛখ͘͢͞Δ΄Ͳɺؔͳ͍ʢRandomUnrekatedʣϖΞ ͷྨࣅ͕Լ͕Δ • 🤔ͦ͏͔ͳ…ʁ 22 ղܾ๏ͷద༻
ରςΩετͷܥྻʹΑͬͯ࠷దͳԹҟͳΔ • ײతʹɺ͍ܥྻΛѻ͏߹΄Ͳݮਰ͋ͬͯ͘΄͍͠ = Թখ͋ͬͯ͘͞΄͍͠ • λεΫɿSummScreenFD • ͍͍ͩͨظ௨Γ͕ͩɺANCEͷΈظͱٯͷ 23
ςϨϏͷຊ͔Βͦͷ ཁΛݕࡧ͢ΔλεΫ ˛ϞσϧɺςΩετͷ͞͝ͱͷԹͱੑೳͷؔ ղܾ๏ͷద༻
·ͱΊ • ςΩετຒΊࠐΈϞσϧʹ͓͍ͯɺςΩετ͕͘ͳΔ΄ͲԼྲྀ λεΫͷੑೳ͕Լ͕Δ͕ଘࡏɻݪҼͱ͍ͯ͠ςΩετͷຒ ΊࠐΈ͕ີूͯ͠͠·͏Length CollapseΛఏএͨ͠ • Length CollapseͷݪҼΛཧతʹಛఆ͠ɺղܾࡦͱͯ͠Attention ʹԹύϥϝʔλΛಋೖ͢ΔTemp
ScaleΛఏҊͨ͠ • Temp ScaleΛಋೖ͢Δ͜ͱͰɺLength Collapse͕ݮ͠ɺԼྲྀ λεΫͷੑೳ্͕͢Δ͜ͱΛ֬ೝͨ͠ 24
ײ • ཧతʹ࣮ݧతʹॆ࣮ͨ͠ίετͷֻ͔͍ͬͯΔจͩͳ͋ͱݴ ͏ҹ • Appendixʹେྔͷ࣮ݧɺ֤ධՁσʔλͷઆ໌͕͋ͬͨΓɺLLMϕʔε ͷຒΊࠐΈϞσϧʹ͍ͭͯݴٴ͕͋Δ • ICLRͰreject͞Εͨόʔδϣϯ͔Βཧ໘ॆ࣮͍ͤͯͯ͞Ғ͍ •
ͦΕͦ͏ͱɺTempScale͏·͘ߦ͍ͬͯΔ͔Θ͔Βͳ͍… • ϞσϧλεΫʹΑ͕͔ͬͯͳΓΒ͍͍ͭͯΔ • TempScaleΛܥྻʹґଘܾͯ͠ఆ͠ɺ܇࿅ͯ͠ΈΔͱͦΕͳΓʹੑೳ ্͕͕Δ͔…ʁ 25