Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unified Language Model Pre-training for Natural...

Unified Language Model Pre-training for Natural Language Understanding and Generation

Avatar for Scatter Lab Inc.

Scatter Lab Inc.

April 10, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. Unified Language Model Pre-training for Natural Language Understanding and Generation

    Li Dong et al., NeurIPS 2019 (Microsoft) ࢲ࢚਋ (ML Research Scientist, Pingpong)
  2. ݾର ݾର 1. Pre-training Language Model ѐਃ 2. Unified Language

    Model 1. Method 2. Pre-training step 3. Fine-tuning step 3. Experiments 1. NLG Task 2. NLU Task
  3. Pre-training Language Model ѐਃ Pre-training Language Model ѐਃ • BERT,

    GPT, ELMOח п੗੄ ߑधਵ۽ જ਷ ࢿҗܳ ঳঻ਵա ױ੼੉ ઓ੤ೠ׮. • (e.g. BERTח নߑೱ੉ۄח ౠࢿਵ۽ ੋ೧ ֫਷ ࢿמਸ ഛࠁೞ৓૑݅ NLG taskীࢲח ॶ ࣻ হ׮.)
  4. •пп੄ LM objectiveח ׮ܲ ݾ੸ਸ о૓׮. •Bidrectional => NLU •Undirectional

    => NLG •Seq-to-Seq => summarization, Generative question answering Pre-training Language Model ѐਃ
  5. Unified Language Model Unified Language Model •unified pre-training਷ ৈ۞ ఋੑ੄

    LMਸ ਤೠ parameterܳ ҕਬೞӝ ٸޙী single transformer݅ ਸ ೙ਃ۽ ೞҊ ৈ۞ LMܳ ߹ب ೟णೡ ೙ਃо হ׮. •parameter੄ ҕਬо text੄ ಴അਸ ખ ؊ general ೞѱ ೟णೡ ࣻ ੓ѱ ೠ׮. (زदী optimizeೞӝ ٸ ޙী single LMী ؀ೞৈ ؏ overfitting) •NLU৬ NLG ܳ زदী ࢎਊ оמ
  6. •UNILM਷ ӝઓ੄ LMਸ ా೤ •пп੄ LM਷ ੸೤ೠ п੗੄ taskо ઓ੤ೞӝ

    ٸޙী ੉ܳ multi-task learningਸ ా೧ زदী ೟ण Unified Language Model
  7. •пӝ ׮ܲ LMܳ ೟ण ೞӝ ਤ೧ࢲ parameterח shareೞ૑݅ Maskingਸ ࢎਊ


    •seq-to-seqܳ ೞա੄ transformer ղࠗী ҳ അೞӝ ਤ೧ࢲ ౠ੉ೠ ഋక੄ Maskingਸ ࢎ ਊ •पઁ ೟ण਷ ੐੄੄ ష௾ਸ [MASK]۽ ஖ജ ೠ ੉റী ੉ܳ ݏ୶ח taskܳ п LM߹۽ द ೯
 •bidirectional LMೡٸח ө૑ NSPೠ׮. Unified Language Model
  8. •[SOS]ח scpecial start-of-sequence
 •[EOS]ח NLU task੄ ޙ੢ ҃҅੉੗ scpecial end-of-sequence


    •Embedding਷ BERTܳ ٮܰݴ textח WordPieceܳ ా೧ tokenize
 •пп੄ LM task߹۽ ׮ܲ segment embedding੉ ࢎਊػ׮. Unified Language Model
  9. Pre-training Setup Unified Language Model •੹୓ training objectiveח п LM੄

    sum •ೞա੄ ߓ஖ ղীח নߑೱ LM objectiveܳ 1/3, द௫झ-द௫झ LM objectiveܳ 1/3, left-to- right and right-to-left LM objectiveח 1/6੄ ࠺ਯ۽ ࢠ೒݂ •੹୓ ౵ۄ޷ఠח BERT_largre۽ ୡӝച •pre-trainingীח English Wikipedia2৬ BookCorpusܳ ࢎਊ
  10. Pre-training Setup Unified Language Model •vocabulary size is 28, 996,

    maximum length of input sequence is 512, batch size 330 •15%੄ tokenਸ ࣁ о૑੄ case ઺ ೞա۽ ஖ജ • 80%੄ ҃਋ : tokenਸ [MASK]۽ ஖ജ •10%੄ ҃਋ : tokenਸ random word۽ ߄Է •10%੄ ҃਋ : tokenਸ ਗې੄ ױয۽ Ӓ؀۽ م •݃झఊ दఃח ߑߨ਷ BERTی Ѣ੄ زੌೞա ೞաо ୶оػ Ѫ੉ 80%ח ݒߣ ೞա੄ ష௾ਸ ݃झఊೞҊ 20%ח bigram੉ա trigramਸ ݃झఊೠ׮. •770, 000 stepө૑ ೟ण೮Ҋ 7 hours੿بݶ 1݅ step੿ب ت׮ ( 8ѐ੄ V100ীࢲ)
  11. Fine-tuning on Downstream NLU and NLG Tasks Unified Language Model

    •NLUীࢲ fine-tuning दীח [SOS] ష௾ਸ representationਵ۽ ࢎਊ ( BERT੄ [CLS] ৬ زੌ ) •NLGܳ fine-tuning दীח target sequenceী ؀ೠ maskingਸ ೞҊ ݏ୶ח taskܳ ૓೯ೠ׮. •੉ җ੿ীࢲ [EOS] ژೠ ૑ਕ૕ ࣻ ੓ӝ ٸޙী ݽ؛਷ ঱ઁ [EOS]ܳ ৘ஏ೧ঠ ೞח૑ب ߓ਎ ࣻ ੓׮ Ҋ ೠ׮.
  12. •CNN/DailyMail => News ӝࢎܳ ࠁҊࢲ ਃডೞח task •RG-N਷ N-gram੄ F1-score

    •seq-to-seqܳ ా೧ fine-tuning (masking റী ݏ୶ח task ૓೯) •beam searchܳ ా೧ decoding ( beam search ઺ী duplicated trigramਸ remove ) •10K training sample ࢎਊदী MASS ખ ؊ ௾ ର੉ܳ ࠁੋ׮. Experiments : Abstractive Summarizaiton
  13. •খী ف ѐח span ৘ஏ੉Ҋ ӝઓ ߡ౟৬ زੌೠ ߑधਵ۽ ૓೯

    •ࣁߣ૩ח free-formೠ ߑधਸ ࢎਊਵ۽ seq-to-seqܳ ా೧ answerܳ generationೠ׮. •inputܳ ݅٘ח ߑध਷ ؀ച ӝ۾, ૕ޙ, passageܳ concatೞৈ first sequenceী ֍Ҋ second segment ܳ ా೧ ੿׹ਸ ৘ஏ Experiments: QA
  14. •Question generation਷ squad ؘ੉ఠ ࣇ੄ ੿׹җ passageܳ ઱Ҋ ૕ޙਸ ࢤࢿೞח

    task •فߣ૩ח DSCT7 ؘ੉ఠ ࣇী ؀ೠ ࢿמ Experiments: Question/ Response Generation