Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Unified Language Model Pre-training for Natural...
Search
Scatter Lab Inc.
April 10, 2020
Research
2.3k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Unified Language Model Pre-training for Natural Language Understanding and Generation
Scatter Lab Inc.
April 10, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Other Decks in Research
See All in Research
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
shunk031
4
1k
東京大学工学部計数工学科、計数工学特別講義の説明資料
kikuzo
0
480
ブレグマン距離最小化に基づくリース表現量推定:バイアス除去学習の統一理論
masakat0
0
280
AI Agentの精度改善に見るML開発との共通点 / commonalities in accuracy improvements in agentic era
shimacos
6
1.7k
コーディングエージェントとABNを再考
hf149
2
710
NII S. Koyama's Lab Research Overview AY2026
skoyamalab
0
310
計算情報学研究室(数理情報学第7研究室)2026
tomohirokoana
0
560
LLM Compute Infrastructure Overview
karakurist
2
1.4k
非試合日の野球場を楽しむためのARホームランボールキャッチ体験システムの開発 / EC79-miyazaki
yumulab
0
230
Φ-Sat-2のAutoEncoderによる情報圧縮系論文
satai
4
780
Apache Gravitinoで実現する Icebergカタログ統合とアクセスの一元化
matsumooon
0
280
長時間動画QAにおけるマルチエージェント推論 ・SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
murakawatakuya
1
120
Featured
See All Featured
The Invisible Side of Design
smashingmag
302
52k
Deep Space Network (abreviated)
tonyrice
0
170
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Test your architecture with Archunit
thirion
1
2.3k
The untapped power of vector embeddings
frankvandijk
2
1.8k
From π to Pie charts
rasagy
0
210
GitHub's CSS Performance
jonrohan
1033
470k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
Code Reviewing Like a Champion
maltzj
528
40k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
55k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
230
23k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Transcript
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong et al., NeurIPS 2019 (Microsoft) ࢲ࢚ (ML Research Scientist, Pingpong)
ݾର ݾର 1. Pre-training Language Model ѐਃ 2. Unified Language
Model 1. Method 2. Pre-training step 3. Fine-tuning step 3. Experiments 1. NLG Task 2. NLU Task
Pre-training Language Model ѐਃ Pre-training Language Model ѐਃ
Pre-training Language Model ѐਃ Pre-training Language Model ѐਃ • BERT,
GPT, ELMOח п ߑधਵ۽ જ ࢿҗܳ ਵա ױ ઓೠ. • (e.g. BERTח নߑೱۄח ౠࢿਵ۽ ੋ೧ ֫ ࢿמਸ ഛࠁೞ݅ NLG taskীࢲח ॶ ࣻ হ.)
•пп LM objectiveח ܲ ݾਸ о. •Bidrectional => NLU •Undirectional
=> NLG •Seq-to-Seq => summarization, Generative question answering Pre-training Language Model ѐਃ
Unified Language Model Pre-training Language Model ѐਃ
Unified Language Model Unified Language Model •unified pre-training ৈ۞ ఋੑ
LMਸ ਤೠ parameterܳ ҕਬೞӝ ٸޙী single transformer݅ ਸ ਃ۽ ೞҊ ৈ۞ LMܳ ߹ب णೡ ਃо হ. •parameter ҕਬо text അਸ ખ ؊ general ೞѱ णೡ ࣻ ѱ ೠ. (زदী optimizeೞӝ ٸ ޙী single LMী ೞৈ ؏ overfitting) •NLU৬ NLG ܳ زदী ࢎਊ оמ
•UNILM ӝઓ LMਸ ా •пп LM ೠ п taskо ઓೞӝ
ٸޙী ܳ multi-task learningਸ ా೧ زदী ण Unified Language Model
•пӝ ܲ LMܳ ण ೞӝ ਤ೧ࢲ parameterח shareೞ݅ Maskingਸ ࢎਊ
•seq-to-seqܳ ೞա transformer ղࠗী ҳ അೞӝ ਤ೧ࢲ ౠೠ ഋక Maskingਸ ࢎ ਊ •पઁ ण షਸ [MASK]۽ ജ ೠ റী ܳ ݏ୶ח taskܳ п LM߹۽ द ೯ •bidirectional LMೡٸח ө NSPೠ. Unified Language Model
•[SOS]ח scpecial start-of-sequence •[EOS]ח NLU task ޙ ҃҅ scpecial end-of-sequence
•Embedding BERTܳ ٮܰݴ textח WordPieceܳ ా೧ tokenize •пп LM task߹۽ ܲ segment embedding ࢎਊػ. Unified Language Model
ࣻधਵ۽ ࢤп೧ࠁݶ п objective ߹۽ M ч ׳ۄ. Unified Language
Model
Pre-training Setup Unified Language Model • training objectiveח п LM
sum •ೞա ߓ ղীח নߑೱ LM objectiveܳ 1/3, द௫झ-द௫झ LM objectiveܳ 1/3, left-to- right and right-to-left LM objectiveח 1/6 ࠺ਯ۽ ࢠ݂ • ۄఠח BERT_largre۽ ୡӝച •pre-trainingীח English Wikipedia2৬ BookCorpusܳ ࢎਊ
Pre-training Setup Unified Language Model •vocabulary size is 28, 996,
maximum length of input sequence is 512, batch size 330 •15% tokenਸ ࣁ о case ೞա۽ ജ • 80% ҃ : tokenਸ [MASK]۽ ജ •10% ҃ : tokenਸ random word۽ ߄Է •10% ҃ : tokenਸ ਗې ױয۽ Ӓ۽ م •݃झఊ दఃח ߑߨ BERTی Ѣ زੌೞա ೞաо ୶оػ Ѫ 80%ח ݒߣ ೞա షਸ ݃झఊೞҊ 20%ח bigramա trigramਸ ݃झఊೠ. •770, 000 stepө ण೮Ҋ 7 hoursبݶ 1݅ stepب ت ( 8ѐ V100ীࢲ)
Fine-tuning on Downstream NLU and NLG Tasks Unified Language Model
•NLUীࢲ fine-tuning दীח [SOS] షਸ representationਵ۽ ࢎਊ ( BERT [CLS] ৬ زੌ ) •NLGܳ fine-tuning दীח target sequenceী ೠ maskingਸ ೞҊ ݏ୶ח taskܳ ೯ೠ. • җীࢲ [EOS] ژೠ ਕ ࣻ ӝ ٸޙী ݽ؛ ઁ [EOS]ܳ ஏ೧ঠ ೞחب ߓ ࣻ Ҋ ೠ.
Experiments Experiments
•CNN/DailyMail => News ӝࢎܳ ࠁҊࢲ ਃডೞח task •RG-N N-gram F1-score
•seq-to-seqܳ ా೧ fine-tuning (masking റী ݏ୶ח task ೯) •beam searchܳ ా೧ decoding ( beam search ী duplicated trigramਸ remove ) •10K training sample ࢎਊदী MASS ખ ؊ ରܳ ࠁੋ. Experiments : Abstractive Summarizaiton
•খী ف ѐח span ஏҊ ӝઓ ߡ৬ زੌೠ ߑधਵ۽ ೯
•ࣁߣ૩ח free-formೠ ߑधਸ ࢎਊਵ۽ seq-to-seqܳ ా೧ answerܳ generationೠ. •inputܳ ݅٘ח ߑध ച ӝ۾, ޙ, passageܳ concatೞৈ first sequenceী ֍Ҋ second segment ܳ ా೧ ਸ ஏ Experiments: QA
•Question generation squad ؘఠ ࣇ җ passageܳ Ҋ ޙਸ ࢤࢿೞח
task •فߣ૩ח DSCT7 ؘఠ ࣇী ೠ ࢿמ Experiments: Question/ Response Generation
•GLUEীࢲ BERT_largeܳ outperform Experiments: GLUE
хࢎפ✌ ୶о ޙ ژח ҾӘೠ ݶ ઁٚ ইې োۅ۽
োۅ ࣁਃ! ࢲ࢚ (ML Research Scientist, Pingpong)
[email protected]