Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning Book 10その1 / deep learning book 1...

himkt
December 05, 2017

Deep Learning Book 10その1 / deep learning book 10 vol1

Deep Learning本の10章(Sequence Modeling: Recurrent and Recursive Nets)の資料です.

himkt

December 05, 2017
Tweet

More Decks by himkt

Other Decks in Research

Transcript

  1. • ΋͠΋֤࣌ࠁ͝ͱʹύϥϝʔλΛ͍࣋ͬͯͨΒ • ֶशσʔλʹग़ݱ͠ͳ͍௕͞ͷܥྻΛѻ͑ͳ͍ • ҟͳΔ௕͞ͷܥྻσʔλͷ৘ใΛߟྀͰ͖ͳ͍ • ύϥϝʔλڞ༗͸Ͳ͏͍͏ͱ͖ʹ༗ޮ͔ʁ • ಛఆͷ৘ใ͕ҟͳΔҐஔʹग़ݱ͠͏Δͱ͖

    • “I went to Nepal in 2009”ͱ”In 2009, I went to Nepal” RNNsͱύϥϝʔλڞ༗  s(0) s(1) s(2) s(3) s(4) s(5) s(0) s(1) s(2) s(3) s(4) s(0) s(1) s(2) s(3) s(4) s(5)
  2. 10.1 ܭࢉάϥϑͷల։  x h h(t 1) h(t+1) h(t) h(...

    ) h(... ) x(t 1) x(t) x(t+1) • ҎԼͷΑ͏ͳग़ྗΛ΋ͨͳ͍RNNΛߟ͑Δ • Unfolding: ͖ͬ͞΍ͬͨ͜ͱͷٯ • ંΓͨͨ·ΕͨηϧΛ࣌ࠁ͝ͱʹల։͢Δͱ…
  3.  x h h(t 1) h(t+1) h(t) h(... ) h(...

    ) x(t 1) x(t) x(t+1) h(t) = f(h(t 1), x(t); ✓) t=3ʹ͍ͭͯల։: ఆٛ: h(3) = f(h(2), x(3); ✓) = f(f(h(1), x(2); ✓), x(3)✓) ڞ༗ 10.1 ܭࢉάϥϑͷల։
  4. 10.1 ల։͞ΕͨRNNͷදݱ • ࣌ࠁ t ʹ͓͚ΔRNNͷηϧ͸ؔ਺ɹɹΛ༻͍ͯ
 ҎԼͷΑ͏ʹ͔͚Δ • ͜ΕΛɼ࣌ࠁʹґଘ͠ͳ͍ɼ֤࣌ࠁʹద༻͞ΕΔ
 ؔ਺

    f ʹ෼ղ͢ΔͱมܗޙͷΑ͏ͳࣜʹͰ͖Δ • ύϥϝʔλΛڞ༗ͨ͠Մม௕ͷೖྗΛड͚෇͚Δؔ਺  h(t) = g(t)(x(t), x(t 1), x(t 2), . . . , x(2), x(1)) = f(h(t 1), x(t); ✓) g(t)
  5. • ࠶ؼతͳఆٛͱύϥϝʔλͷڞ༗Λલఏͱͯ͠ɼ
 ͍Ζ͍ΖͳRNNΛߟ͑Δ͜ͱ͕Ͱ͖Δ o (t 1) o (t 1) o

    (t) o (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U o (t 1) o (t 1) o (t) o (t) L(t 1) L(t 1) L(t) L(t) y (t 1) y (t 1) y (t) y (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U Train time Test time h (t 1) h (t 1) W h (t) h (t) . . . . . . x (t 1) x (t 1) x (t) x (t) x (...) x (...) W W U U U h ( ) h ( ) x ( ) x ( ) W U o ( ) o ( ) y ( ) y ( ) L( ) L( ) V . . . . . . 10.2 Recurrent Neural Network  U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html U V W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o (... ) o (... ) h (... ) h (... ) V V V U U U Unfold
  6. ҰൠతͳRNN  med with the graph unrolling and parameter sharing

    ideas of section 10.1, w n design a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
  7. લͷ࣌ࠁͷग़ྗͷ஋͕ӅΕ૚ʹಧ͘RNN  http://www.deeplearningbook.org/lecture_slides.html U V W o (t 1) o

    (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W o (... ) o (... ) h (... ) h (... ) V V V U U U Unfold
  8. ࠷ऴ࣌ࠁͷηϧ͔Βग़ྗΛ͢ΔRNN  http://www.deeplearningbook.org/lecture_slides.html ompute the output for the previous time

    step first, because the des the ideal value of that output. h (t 1) h (t 1) W h (t) h (t) . . . . . . x (t 1) x (t 1) x (t) x (t) x (...) x (...) W W U U U h ( ) h ( ) x ( ) x ( ) W U o ( ) o ( ) y ( ) y ( ) L( ) L( ) V . . . . . .
  9. • ޯ഑͸Ͳ͏΍ͬͯٻΊΔͷ͔? • Back Propagation Through Time: ܥྻ௕ʹର͠ઢܗ • ॱ఻೻ܭࢉΛ͢Δ

    => ޙΖ͔ΒޡࠩΛܭࢉ͍ͯ͘͠ 10.2 ֶश͸Ͳ͏͢Δͷ͔  http://www.phontron.com/slides/nlp-programming-ja-08-rnn.pdf
  10. 10.2.1 Teacher Forcingͱग़ྗͰͷ࠶ؼ  o (t 1) o (t 1)

    o (t) o (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U o (t 1) o (t 1) o (t) o (t) L(t 1) L(t 1) L(t) L(t) y (t 1) y (t 1) y (t) y (t) h (t 1) h (t 1) h (t) h (t) x (t 1) x (t 1) x (t) x (t) W V V U U Train time Test time http://www.deeplearningbook.org/lecture_slides.html
  11. • աڈͷӅΕ૚ʹର͢Δґଘ͕੾Ε͍ͯΔͷͰ
 ֶशΛฒྻԽ͢Δ͜ͱ͕Մೳ • ͨͩ͠ɼΫϥεϥϕϧͷσʔλ͔͠ಘΒΕͳ͍ͷͰ
 Ϟσϧͷදݱྗ͸͔ͳΓམͪΔ • ଴ͬͯɼਪ࿦࣌ʹ͸TeacherͳΜ͍ͯͳ͘ͳ͍ʁʁ • ֶश࣌:

    ڭࢣσʔλ • ਪ࿦࣌ʢ։ϧʔϓʣ: ࣗ਎ͷग़ྗ • Teacher forcing͸ਪ࿦࣌ʹੑೳ͕҆ఆ͠ͳ͍͜ͱ͕ଟ͍ • free-learningΛ͢Δ͜ͱͰ؇࿨͢ΔΒ͍͠ʢn-bestతͳʁʣ  10.2.1 Teacher Forcingͱग़ྗͰͷ࠶ؼ
  12. 10.2.2 RNNʹ͓͚Δޯ഑ͷܭࢉ • BPTTʹΑͬͯޯ഑ΛٻΊΔ • ֤ϊʔυͰඍ෼ => ύϥϝʔλͰඍ෼  TER

    10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS mation flow forward in time (computing outputs and losses) and backward me (computing gradients) by explicitly showing the path along which this mation flows. Recurrent Neural Networks d with the graph unrolling and parameter sharing ideas of section 10.1, we esign a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold 10.3: The computational graph to compute the training loss of a recurrent network maps an input sequence of x values to a corresponding sequence of output o values. http://www.deeplearningbook.org/lecture_slides.html a(t) = b + Wh(t 1) + Ux(t) h(t) = tanh(a(t)) o(t) = c + Vh(t) ˆ y(t) = softmax(o(t))
  13. ϩεΛग़ྗ૚Ͱඍ෼  with the graph unrolling and parameter sharing ideas

    of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
  14. ϩεΛग़ྗ૚Ͱඍ෼  ΫϥεͰల։ L = X t L(t) ) @L

    @L(t) = 1 where L(t) = X c y(t) c log ˆ y(t) c @L @o(t) i = @L @L(t) · @L(t) @o(t) i = @L @L(t) · X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i
  15. ϩεΛग़ྗ૚Ͱඍ෼  @exp (o(t) i ) @o(t) k = (

    exp (o(t) i ) (i = k) 0 (otherwise) ˆ y(t) i = softmax(o(t))i = exp (o(t) i ) P c exp (o(t) c ) X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i + X c6=i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i ͳͷͰɼiʹ͍ͭͯ৔߹෼͚͢Δͱ… ͜͜Ͱɼ JD J㱠D
  16. ϩεΛग़ྗ૚Ͱඍ෼  @L(t) @ˆ y(t) i · @ˆ y(t) i

    @o(t) i = y(t) i ˆ y(t) i · exp (o(t) i ) P c exp (o(t) c ) exp (o(t) i ) exp (o(t) i ) P c exp (o(t) c )2 = y(t) i ˆ y(t) i · exp (o(t) i )( P c exp (o(t) c ) exp (o(t) i )) P c exp (o(t) c )2 = y(t) i ˆ y(t) i · ˆ y(t) i (1 ˆ y(t) i ) = y(t) i (1 ˆ y(t) i ) = y(t) i ˆ y(t) i y(t) i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = y(t) c ˆ y(t) c · exp (o(t) i ) exp (o(t) c ) P c exp (o(t) c )2 = y(t) c ˆ y(t) c · ˆ y(t) i ˆ y(t) c = y(t) c · ˆ y(t) i f g 0 f0g fg0 g2
  17. ϩεΛग़ྗ૚Ͱඍ෼  ·ͱΊΔͱ… X c @L(t) @ˆ y(t) c ·

    @ˆ y(t) c @o(t) i = @L(t) @ˆ y(t) i · @ˆ y(t) i @o(t) i + X c6=i @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = y(t) i ˆ y(t) i y(t) i + X c6=i y(t) c · ˆ y(t) i = y(t) i + X c y(t) c · ˆ y(t) i = ˆ y(t) i y(t) i (* X c y(t) c = 1) ) @L @o(t) i = @L @L(t) · @L(t) @o(t) i = @L @L(t) · X c @L(t) @ˆ y(t) c · @ˆ y(t) c @o(t) i = ˆ y(t) i y(t) i ϕΫτϧͰදݱ͢Δͱʁ JOEJDBUPSGVODUJPOΛ࢖͏
 ڭՊॻͱҰக͢Δ
  18. ϩεΛதؒ૚Ͱඍ෼  with the graph unrolling and parameter sharing ideas

    of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
  19. ϩεΛதؒ૚Ͱඍ෼ • ϩεͷಉ࣌͡ࠁͷग़ྗ૚Ͱͷޯ഑ͱ
 ϩεͷ1࣌ࠁઌͷதؒ૚Ͱͷޯ഑ͷ࿨  @L @h(t) i = @L

    @L(t) · ⇣X u @L(t) @h(t+1) u @h(t+1) u @h(t) i + X c @L(t) @o(t) c @o(t) c @h(t) i ⌘ = @L(t) @h(t+1) i @h(t+1) i @h(t) i + @L(t) @o(t) i @o(t) i @h(t) i ⇣ * @h(t+1) u @h(t) i = 0 (u 6= i) & @o(t) c @h(t) i = 0 (c 6= i) ⌘
  20. ϩεΛதؒ૚Ͱඍ෼  h(t) i = tanh(a(t) i ) @h(t+1) i

    @h(t) i = @ tanh(a(t+1) i ) @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · @ (bi + Wh(t) i + Ux(t+1) i ) @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · X u Wiu ) @L(t) @h(t+1) i @h(t+1) i @h(t) i = ⇣ 1 tanh2(a(t+1) i ) ⌘ · (rh(t+1)L )i · X u Wiu )rh(t) L = WT · (rh(t+1) L) · diag ⇣ 1 (h(t+1)))2 ⌘ @o(t) i @h(t) i = @ ci + V h(t) i @h(t) i = X u Viu ) @L @h(t) i = ro(t) L i X u Viu )rh(t) L = V T ro(t) L
  21. ϩεΛதؒ૚Ͱඍ෼  ੔ཧ͢Δͱ… @L @h(t) i = @L @L(t) ·

    ⇣X u @L(t) @h(t+1) u @h(t+1) u @h(t) i + X c @L(t) @o(t) c @o(t) c @h(t) i ⌘ = @L(t) @h(t+1) i @h(t+1) i @h(t) i + @L(t) @o(t) i @o(t) i @h(t) i ⇣ * @h(t+1) u @h(t) i = 0 (u 6= i) & @o(t) c @h(t) i = 0 (c 6= i) ⌘ )rh(t) L = WT · (rh(t+1) L) · diag ⇣ 1 (h(t+1)))2 ⌘ + V T ro(t) L
  22. ϩεΛύϥϝʔλc͓ΑͼbͰඍ෼  with the graph unrolling and parameter sharing ideas

    of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html Ͳ͔͜
  23. ϩεΛύϥϝʔλc͓ΑͼbͰඍ෼  @L @ci = X t X k @L(t)

    @o(t) k · @o(t) k @ci = X t @L(t) @o(t) i (* i 6= k ) @o(t) k @ci = 0) )rcL = X t ro(t) L(t) @L @bi = X t X c @L(t) @h(t) c · X k @h(t) c @a(t) k · @a(t) k @bi = X t (rh(t) L(t))i · (1 tanh2(h(t) i )) (* k 6= i ) @a(t) k @bi = 0) ) rbL = X t diag(1 (h(t))2)rh(t) L
  24. ϩεΛύϥϝʔλVͰඍ෼  with the graph unrolling and parameter sharing ideas

    of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
  25. ϩεΛύϥϝʔλVͰඍ෼  ( @L @V )ij = X t X

    c @L @L(t) · @L(t) @o(t) c · @o(t) c @Vij = X t @L(t) @o(t) i · h(t) j = X t @L @o(t) i · h(t) j * @o(t) c @Vij = @(ci + Vh(t) i ) @Vij = h(t) j (* Vh(t) i = X k Vikh(t) k ) ) rV L = X t (ro(t) L)h(t)T
  26. ϩεΛύϥϝʔλWͰඍ෼  with the graph unrolling and parameter sharing ideas

    of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
  27. ϩεΛύϥϝʔλWͰඍ෼  ࿈࠯཯ ( @L @W )ij = X t

    X u @L @h(t) u · @h(t) u @Wij = X t @L @h(t) i · @h(t) i @Wij (* h(t) i = tanh(a(t) i ) ) @h(t) i @Wij = X c @h(t) i @a(t) c · @a(t) c @Wij = @h(t) i @a(t) i · @a(t) i @Wij ) = 1 tanh2(a(t) i ) · @a(t) i @Wij = (1 h(t)2 )h(t 1) j (* (Wh(t))i = X k Wikh(t) k ) @(Wh(t))i @Wij = h(t) j )
  28. ϩεΛύϥϝʔλUͰඍ෼  with the graph unrolling and parameter sharing ideas

    of section 10.1, we gn a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold http://www.deeplearningbook.org/lecture_slides.html
  29. ϩεΛύϥϝʔλUͰඍ෼  ( @L @U )ij = X t X

    u @L @h(t) u · @h(t) u @Uij = X t @L @h(t) i · @h(t) i @Uij = X t @L @h(t) i · X c @h(t) i @a(t) c · @a(t) c @Uij = X t @L @h(t) i · @h(t) i @a(t) i · @a(t) i @Uij = X t @L @h(t) i · (1 tanh2(h(t) i )) · x(t) j (* Ux(t) i = X k Uikx(t) k ) @Ux(t) i @Uij = x(t) j ) ) rU L = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)x(t)T
  30. ޯ഑ʹ͍ͭͯͷ·ͱΊ  rcL = X t ro(t) L(t) rbL =

    X t diag(1 (h(t))2)rh(t) L rWL = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)h(t 1)T rUL = X t diag ⇣ 1 (h(t))2 ⌘ (rh(t) L)x(t)T rVL = X t (ro(t) L)h(t)T ͜ΕͰֶश͕Ͱ͖Δʂ
  31. 10.2.4 RNNʹΑͬͯจ຺Λߟྀͨ͠ܥྻϞσϦϯά • RNNͷηϧ͸ೖྗͷ֤࣌ࠁͷ஋ ͷΈΛड͚औΔʁ
 => ඞͣ͠΋ͦͷඞཁ͸ͳ͘ɼશೖྗΛड͚ͯ΋ྑ͍ • ৚݅෇͖֬཰৔ (CRF)

    Ͱͷٞ࿦ͱࣅͨΑ͏ͳײ͡ʁ  http://www.deeplearningbook.org/lecture_slides.html x(t) P(y(1), . . . , y(⌧)|x(1), . . . , y(⌧)) = Y t P(y(t)|x(1), . . . , y(⌧)) ܥྻͷ֬཰ʹରͯ͠
 ͜ͷΑ͏ͳ෼ղΛ͍ͯ͠ΔͱΈͳͤΔ
  32. 10.4 Encoder-Decoder Sequence-to-Sequence  vector. We have seen in figure

    10.9 how an RNN can map a fixed-size vector to a sequence. We have seen in figures 10.3, 10.4, 10.10 and 10.11 how an RNN can map an input sequence to an output sequence of the same length. Encoder … x (1) x (1) x (2) x (2) x (...) x (...) x (nx) x (nx) Decoder … y (1) y (1) y (2) y (2) y (...) y (...) y (ny) y (ny) C C http://www.deeplearningbook.org/lecture_slides.html
  33. 10.4 Encoder-Decoder Sequence-to-Sequence • Encoder͸ೖྗΛจ຺ϕΫτϧʹූ߸Խ • Decoder͸จ຺ϕΫτϧΛ෮߸Խ͠ग़ྗΛಘΔ • Encoder-Decoder͸Մม௕ͷೖྗɾग़ྗ͕Մೳ (seq2seq)

    • ຋༁ͳͲɼೖग़ྗͷܥྻ௕͕Ұக͠ͳ͍৔߹ʹخ͍͠ • RNNΛ࢖ͬͨEncoder-DecoderϞσϧΛ
 RNN Sequence-to-Sequenceͱ͍͏ 
  34. 10.4 จ຺Cʹ͍ͭͯ • C͸ݻఆ௕ͷϕΫτϧͰ͋Δ͜ͱ͕ଟ͍ • χϡʔϥϧػց຋༁ͷ࿦จ [Bahdanau+, 2015] Ͱ͸… •

    C͸Մม௕ͷϕΫτϧͰ΋ྑ͍ • ΞςϯγϣϯػߏΛಋೖͯ͠จ຺ΛΑΓ׆༻͢Δ  https://arxiv.org/pdf/1409.0473.pdf
  35. 10.5 Deep Recurrent Neural Network  h y x z

    (a) (b) (c) x h y x h y http://www.deeplearningbook.org/lecture_slides.html
  36.  • ৭ʑͳੵΈํ͕͋Δ • ୯७ʹRNN૚ͷ࣍ʹRNN૚Λ௥Ճ (a) • RNN૚ͷग़ྗΛMLPʹೖྗɼಘΒΕͨग़ྗ
 ΛRNNͷ࣍ͷೖྗͱ͢Δ (b)

    • (c) ͸εΩοϓ઀ଓΛಋೖ͢Δ͜ͱͰɼ
 χϡʔϥϧωοτϫʔΫ্ͷϊʔυ͔Βϊʔυͷ
 ࠷୹ڑ཭͕௕͘ͳͬͯ͠·͏͜ͱΛ๷͍Ͱ͍Δ 10.5 Deep Recurrent Neural Network
  37. 10.6 Recursive Neural Network  http://www.deeplearningbook.org/lecture_slides.html 10.6 Recursive Neural Networks

    x (1) x (1) x (2) x (2) x (3) x (3) V V V y y L L x (4) x (4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of the
  38. 10.6 Recursive Neural Network • RNN (recurrent neural network): ઢܗͳܥྻσʔλ

    • RNN (recursive neural network): ໦ߏ଄Λ࣋ͭσʔλ  CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS information flow forward in time (computing outputs and losses) and backward in time (computing gradients) by explicitly showing the path along which this information flows. 10.2 Recurrent Neural Networks Armed with the graph unrolling and parameter sharing ideas of section 10.1, we can design a wide variety of recurrent neural networks. U U V V W W o (t 1) o (t 1) h h o o y y L L x x o (t) o (t) o (t+1) o (t+1) L(t 1) L(t 1) L(t) L(t) L (t+1) L (t+1) y (t 1) y (t 1) y (t) y (t) y (t+1) y (t+1) h (t 1) h (t 1) h (t) h (t) h (t+1) h (t+1) x(t 1) x(t 1) x(t) x(t) x(t+1) x(t+1) W W W W W W W W h (... ) h (... ) h (... ) h (... ) V V V V V V U U U U U U Unfold Figure 10.3: The computational graph to compute the training loss of a recurrent network that maps an input sequence of x values to a corresponding sequence of output o values. CHAPTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE NETS can be mitigated by introducing skip connections in the hidden-to-hidden path, a illustrated in figure 10.13c. 10.6 Recursive Neural Networks x (1) x (1) x (2) x (2) x (3) x (3) V V V y y L L x (4) x (4) V o o U W U W U W Figure 10.14: A recursive network has a computational graph that generalizes that of th recurrent network from a chain to a tree. A variable-size sequence x(1), x(2), . . . , x(t) ca be mapped to a fixed-size representation (the output o), with a fixed set of paramete http://www.deeplearningbook.org/lecture_slides.html
  39. 10.7 ௕ڑ཭هԱ  PTER 10. SEQUENCE MODELING: RECURRENT AND RECURSIVE

    NETS 60 40 20 0 20 40 60 Input coordinate 4 3 2 1 0 1 2 3 4 Projection of output 0 1 2 3 4 5 e 10.15: When composing many nonlinear functions (like the linear-tanh layer http://www.deeplearningbook.org/lecture_slides.html
  40. 10.7 ௕ڑ཭هԱ  • RNNͷӅΕ૚ • ύϥϝʔλߦྻ͕ݻ༗஋෼ղՄೳͱ͢Δͱ ht = Wht

    1 = W(Wht 2) = WW(Wht 3) · · · = Wth0 ht = Wht 1 = (Q⇤Q 1)(Q⇤Q 1) . . . (Q⇤Q 1)h0 = Q⇤tQ 1h0 (where W = Q⇤Q 1)
  41. 10.7 ௕ڑ཭هԱ • ݻ༗஋͕1ΑΓେ͖͍৔߹ -> രൃ • ݻ༗஋͕1ΑΓখ͍͞৔߹ -> ফࣦ

    • ࠷େͷݻ༗ϕΫτϧͱಉ͡޲͖Λ࣋ͨͳ͍ɹ ͷཁૉ͸
 ࠷ऴతʹഁغ͞ΕΔ (?)  h0