Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Optimus: Organizing Sentences via Pre-tr...

[輪講資料] Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

事前学習済み言語モデルを統合することによって構築される大規模Variational Auto-Encoder (VAE)モデルのOptimusと、その論文について解説した資料です。
Optimusを支えるVAEの目的関数の導出から丁寧に紹介します。

Hayato Tsukagoshi

October 18, 2022
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

    Graduate school of Informatics, Nagoya University, Japan. ൃදऀ: Hayato Tsukagoshi Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang, and Jianfeng Gao EMNLP 2020 URL: https://aclanthology.org/2020.emnlp-main.378/
  2. •Auto-Encoderͷજࡏදݱͷ෼෍ʹ੍໿ΛՃ͑ͨ΋ͷ (ͱݟ၏ͤΔ) • AEͱ͸ҟͳΔಈػͱཧ࿦എܠΛ͕࣋ͭɺࣅͨ΋ͷͱղऍͰ͖Δ • જࡏදݱʹର͢Δ੍໿ʹΑͬͯσʔλͷੜ੒͕༰қʹ • Kingma et al.,

    2013. Auto-Encoding Variational Bayes ͰఏҊ •જࡏදݱͷ෼෍ʹ͸೚ҙͷࣄલ෼෍ (prior) Λબ΂Δ • ଟ͘ͷ৔߹͸ඪ४ਖ਼ن෼෍ (standard normal distribution) •ଛࣦؔ਺ͱͯ͠ೋͭͷଛࣦΛ଍͠߹Θͤͯ༻͍Δ • ࠶ߏ੒ޡࠩ • જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ Variational Auto-Encoder (VAE): ม෼ࣗݾූ߸Խث 9
  3. VAEͷϞσϧߏ଄ 11 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder ೖྗΛϕΫτϧදݱʹม׵
  4. VAEͷϞσϧߏ଄ 12 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder ϕΫτϧදݱ͔ΒΨ΢ε෼෍ͷ
 ฏۉͱ෼ࢄڞ෼ࢄߦྻΛग़ྗ
  5. VAEͷϞσϧߏ଄ 13 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder ฏۉͱ෼ࢄڞ෼ࢄߦྻΛ༻͍ͯΨ΢ε෼ ෍͔ΒαϯϓϦϯάɺજࡏදݱΛ֫ಘ
  6. AEͱVAEͷϞσϧߏ଄ͷൺֱ 15 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder જࡏදݱ
 z x Encoder x’ Decoder AE VAE
  7. AEͱVAEͷϞσϧߏ଄ͷൺֱ 16 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder જࡏදݱ
 z x Encoder x’ Decoder AE VAE જࡏදݱΛαϯϓϦϯά͢Δ ͨΊͷॲཧͱ
 જࡏදݱͷ෼෍ʹؔ͢Δ
 ଛࣦ͕૿͑Δ͚ͩ
  8. GAN •ࣝผث(Discriminator)͕ੜ੒ث(Generator)ͷग़ྗΛ෼ྨͰ͖ͳ͍Α͏ʹֶश VAE •જࡏදݱͷ෼෍͕ࣄલ෼෍ʹۙͮ͘Α͏ʹ + ೖྗΛ࠶ߏ੒͢ΔΑ͏ʹֶश Normalizing fl ow •ٯม׵Մೳͳࣸ૾Λֶशɺෳࡶͳજࡏදݱͷ෼෍Λߏ੒

    •VAEͱ૊Έ߹ΘͤՄೳ Di ff usion Models •ॱํ޲ͰϊΠζՃࢉɺٯํ޲ͰϊΠζΛআڈ͢ΔΑ͏ʹϞσϧΛֶश VAEͱͦͷଞͷੜ੒Ϟσϧͷൺֱ 18 ม෼ਪ࿦ͱ Normalizing Flow
  9. •VAEͷଛࣦؔ਺͸ҎԼͷೋͭͷ଍͠߹Θͤ • ࠶ߏ੒ޡࠩ • ਖ਼ଇԽ߲ (જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ) • ͸Encoderͷύϥϝʔλɺ ͸Decoderͷύϥϝʔλ ϕ

    θ VAEͷ໨తؔ਺ 19 ℒ = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ
  10. •ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ

    • Λۙࣅͨ͠ Ͱଥڠ • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍ • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ X Z pθ (Z|X) pθ (X) pθ (Z|X) pθ (Z|X) qϕ (Z|X) qϕ (Z|X) pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 20
  11. •ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ

    • Λۙࣅͨ͠ Ͱଥڠ • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍ • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ X Z pθ (Z|X) pθ (X) pθ (Z|X) pθ (Z|X) qϕ (Z|X) qϕ (Z|X) pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 21
  12. VAEͷ໨తؔ਺ͷٻΊํ 22 log pθ (X) = log ∫ pθ (X,

    z) dz = log ∫ pθ (X, z) qϕ (z|X) qϕ (z|X) dz = log ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ zͰपลԽͨ͠΋ͷ
 ͱΈͳ͢
  13. VAEͷ໨తؔ਺ͷٻΊํ 23 log pθ (X) = log ∫ pθ (X,

    z) dz = log ∫ pθ (X, z) qϕ (z|X) qϕ (z|X) dz = log ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ 1Λ͔͚ͯ΋͍ͬ͠ΐ
  14. VAEͷ໨తؔ਺ͷٻΊํ 24 ΠΣϯηϯͷෆ౳ࣜΑΓɺ
 ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ f(x) = log(x) ∫

    pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ≥ ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ∫ ͢ͳΘͪ ≥
  15. VAEͷ໨తؔ਺ͷٻΊํ 25 ΠΣϯηϯͷෆ౳ࣜΑΓɺ
 ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ f(x) = log(x) ∫

    pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ≥ ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ∫ ͢ͳΘͪ ≥
  16. VAEͷ໨తؔ਺ͷٻΊํ 26 ͜͜ͰӈลΛ ͱ͓͘ͱ log pθ (X) ≥ ℒ(θ, ϕ;

    X) ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ͱॻ͚Δɻ͜ͷ Λ
 ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ ℒ(θ, ϕ; X)
  17. VAEͷ໨తؔ਺ͷٻΊํ 27 ELBOΛม෼Լݶͱॻ͘͜ͱ΋͋Δ͕ɺlower limit (Լݶ)Ͱ͸ͳ͘lower boundͳͷͰԼք͕ਖ਼͍͠Μ͡Όͳ͍͔ͱࢥ͍ͬͯΔ ͜͜ͰӈลΛ ͱ͓͘ͱ log pθ

    (X) ≥ ℒ(θ, ϕ; X) ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ͱॻ͚Δɻ͜ͷ Λ
 ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ ℒ(θ, ϕ; X)
  18. VAEͷ໨తؔ਺ͷٻΊํ 28 ͱ͜ΖͰઌ΄Ͳͷෆ౳ࣜͷ྆ลͷࠩ ʹ͍ͭͯߟ͑ͯΈΔͱ log pθ (X) − ℒ(θ, ϕ;

    X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) − = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ∫ − qϕ (z|X) dz
  19. VAEͷ໨తؔ਺ͷٻΊํ 29 log pθ (X) − ℒ(θ, ϕ; X) =

    ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ∫ − = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log ∫ log pθ (X) dz − = ∫ log pθ (z|X) pθ (X) dz qϕ (z|X) dz qϕ (z|X) qϕ (z|X) pθ (X) qϕ (z|X)
  20. ∫ log pθ (z|X) pθ (X) dz pθ (X) VAEͷ໨తؔ਺ͷٻΊํ

    30 log pθ (X) − ℒ(θ, ϕ; X) = = = DKL ( qϕ (z|X) ∥ pθ (z|X) ) qϕ (z|X) qϕ (z|X) ∫ log pθ (z|X) dz qϕ (z|X) qϕ (z|X)
  21. VAEͷ໨తؔ਺ͷٻΊํ 31 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X)

    + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X)
  22. VAEͷ໨తؔ਺ͷٻΊํ 32 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X)

    + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X)
  23. VAEͷ໨తؔ਺ͷٻΊํ 33 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X)

    + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X) ্ࣜӈล ୈ1߲ͱୈ2߲ͷ࿨͕ෆม → ୈ2߲͕খ͘͞ͳΔͳΒ
 ୈ1߲͸େ͖͘ͳΒͳ͍ͱ͍͚ͳ͍
  24. dz VAEͷ໨తؔ਺ͷٻΊํ 34 ℒ(θ, ϕ; X) = ∫ pθ (X,

    z) dz qϕ (z|X) qϕ (z|X) log qϕ (z|X) qϕ (z|X) log ∫ = pθ (X|z) pθ (z) qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz + ͱ͜ΖͰɺม෼ԼքΛ͞Βʹ෼ղͯ͠ΈΔͱ
  25. VAEͷ໨తؔ਺ͷٻΊํ 35 ℒ(θ, ϕ; X) qϕ (z|X) log ∫ =

    pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz − qϕ (z|X) log ∫ = pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) ໬౓ ਖ਼ଇԽ߲ qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz +
  26. VAEͷ໨తؔ਺ͷٻΊํ 36 ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ
 ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ ℒ(θ, ϕ; X) −ℒ(θ,

    ϕ; X) −ℒ(θ, ϕ; X) = qϕ (z|X) log ∫ pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ
  27. VAEͷ໨తؔ਺ͷٻΊํ 37 ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ
 ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ ℒ(θ, ϕ; X) −ℒ(θ,

    ϕ; X) −ℒ(θ, ϕ; X) = qϕ (z|X) log ∫ pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ ʹΨ΢ε෼෍Λ
 Ծఆ͢Ε͹ɺղੳతʹ
 ଛࣦؔ਺ΛٻΊΒΕΔ pθ (z)
  28. VAEͷϞσϧߏ଄ (࠶ܝ) 39 જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder ຊ౰͸͜͜ʹ reperameterization trick
 ͱ͍͏ςΫ͕ڬ·Δ
  29. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 51 z [CLS] w1 w2 … BERT GPT-2

    reparameterization trick μ σ WE / WM WD sampling
  30. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 52 z [CLS] w1 w2 … [CLS] w1

    w2 … w1 w2 w3 … BERT GPT-2 reparameterization trick μ σ WE / WM WD sampling
  31. Language Modeling •Optimus͕จΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ •จੜ੒ʹ͓͚ΔPerplexity (PPL), MI Guided Language Generation •ಛఆͷ৚݅ʹैͬͨจΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ

    •ର࿩Ԡ౴ੜ੒ɺಛఆελΠϧͰͷԠ౴ੜ੒ɺϥϕϧͰ৚݅෇͚ͨ͠จੜ੒ Low-resource Language Understanding •௿ࢿݯઃఆͰͷOptimusͷ༗༻ੑΛݕূ •จຒΊࠐΈϕʔεͰGLUEΛղ͍ͯੑೳݕূ ධՁ࣮ݧ 62
  32. •જࡏදݱ࣍ݩ: 32 • ެ։͞Ε͍ͯΔ࣮૷͔Β൑அ •VAEͱͯ͠ͷ܇࿅σʔλ: ӳޠWikipedia 199ສจ •จੜ੒ܥͷλεΫͰ͸͞ΒʹͦΕͧΕͷσʔληοτͰ1 epochֶ͚ͩश •ֶशͷ޻෉͕͍Ζ͍Ζ

    • Λֶशதʹ૿Ճͤ͞ΔͳͲ •Low-resource Language UnderstandingͰ͸Encoder (BERT)ͷ[CLS]ʹରԠ ͢ΔදݱΛར༻ • ͳͷͰɺϕΫτϧͷ࣍ݩ਺͸32Ͱ͸ͳ͘768 β ࣮ݧઃఆ 63 જࡏදݱͷ࣍ݩ਺͕࿦จʹ໌ه͞Ε͍ͯͳ͍ؾ͕͢Δ…
  33. •OptimusͷજࡏදݱΛ༻͍Δ͜ͱͰจදݱͷԋࢉ͕Մೳ • Λ΋ͱʹจੜ੒ •͜ͷ݁ՌΛͲ͏ड͚औΕ͹͍͍ͷ͔…? zD = zB − zA +

    zC ධՁ࣮ݧ: Guided Language Generation 68 ࿦จͰ঺հ͞Ε͍ͯΔ σϞαΠτ ͸ΞΫηεͰ͖ͳ͘ͳ͍ͬͯΔ😇
  34. •3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ • ର࿩Ԡ౴ੜ੒ • ಛఆελΠϧͷจੜ੒ • ৚݅෇͖ੜ੒ •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ
 ϥϕϧʹجͮ͘ςΩετΛੜ੒ •

    ੜ੒จͷϥϕϧ෼ྨ֬཰΍
 ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ
 ධՁ࣮ݧ: Guided Language Generation 71 ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ
  35. •3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ • ର࿩Ԡ౴ੜ੒ • ಛఆελΠϧͷจੜ੒ • ৚݅෇͖ੜ੒ •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ
 ϥϕϧʹجͮ͘ςΩετΛੜ੒ •

    ੜ੒จͷϥϕϧ෼ྨ֬཰΍
 ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ
 ධՁ࣮ݧ: Guided Language Generation 72 ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ
  36. •OptimusͷEncoderදݱΛ༻͍ͯ
 ઢܗ෼ྨثΛ܇࿅ • Yelpσʔληοτͷײ৘෼ྨλεΫ •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯
 •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋
 ൺֱతߴ͍෼ྨੑೳ • ੑೳ্͕͕Δͷ͕एׯૣ͍ •

    ಛʹ fi ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ
 BERTΑΓ΋ੑೳ͕ߴ͍ • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ
 Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ ධՁ࣮ݧ: Low-resource Language Understanding 73
  37. •OptimusͷEncoderදݱΛ༻͍ͯ
 ઢܗ෼ྨثΛ܇࿅ • Yelpσʔληοτͷײ৘෼ྨλεΫ •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯
 •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋
 ൺֱతߴ͍෼ྨੑೳ • ੑೳ্͕͕Δͷ͕एׯૣ͍ •

    ಛʹ fi ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ
 BERTΑΓ΋ੑೳ͕ߴ͍ • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ
 Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ ධՁ࣮ݧ: Low-resource Language Understanding 74