Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20180306_NIPS2017_DeepLearning

yoppe
March 06, 2018

 20180306_NIPS2017_DeepLearning

NIPS2017 ( https://nips.cc/Conferences/2017 ) に参加して、Deep Learning 関連で面白いと思った話題。

yoppe

March 06, 2018
Tweet

More Decks by yoppe

Other Decks in Science

Transcript

  1. ਂ૚ֶशؔ࿈ͷൃද਺ʢ਺͸֓ࢉʣ • Tutorials: શ 9 ݅த 3 ݅ • Invited

    talks: શ 7 ݅த 2 ݅ • Orals: શ 41 ݅த 8 ݅ • Posters: શ 679 ݅த ໿200 ݅ • Workshops: શ 53 ݅த 5 ݅ʢؚ Deep Learning in λΠτϧʣ 4
  2. ਂ૚ֶशؔ࿈ͷൃද: Tutorials • Tutorials: શ 9 ݅த 3 ݅ •

    Deep Learning: Practice and Trends • Deep Probabilistic Modeling with Gaussian Processes • Geometric Deep Learning on Graphs and Manifolds 5
  3. ਂ૚ֶशؔ࿈ͷൃද: Invited talks • Invited talks: શ 7 ݅த 2

    ݅ • Deep Learning for Robotics • On Bayesian Deep Learning and Deep Bayesian Learning 6
  4. ਂ૚ֶशؔ࿈ͷൃද: Orals • Orals: શ 41 ݅த 8 ݅ •

    TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning • Train longer, generalize better: closing the generalization gap in large batch training of neural networks • End-to-End Differentiable Proving • Gradient descent GAN optimization is locally stable 7
  5. ਂ૚ֶशؔ࿈ͷൃද: Orals (Cont'd) • Orals: શ 41 ݅த 8 ݅

    • Imagination-Augmented Agents for Deep Reinforcement Learning • Masked Autoregressive Flow for Density Estimation • Deep Sets • From Bayesian Sparsity to Gated Recurrent Nets 8
  6. ਂ૚ֶशؔ࿈ͷൃද: Posters • Posters: શ 679 ݅த ໿200 ݅ •

    ଟ͗ͯ͢ྻڍͰ͖ͳ͍ ! • ͪͳΈʹಠஅͱภݟͰ਺ΛΧ΢ϯτ͍ͯ͠·͢ • ৄࡉ͸ NIPSͷϖʔδ ΛνΣοΫͯ͠Լ͍͞ NIPSͷϖʔδ: https://nips.cc/Conferences/2017/Schedule?type=Poster 9
  7. ਂ૚ֶशؔ࿈ͷൃද: Workshops • Workshops: શ 53 ݅த 5 ݅ʢؚ Deep

    Learning in λΠτϧʣ • Deep Learning for Physical Sciences • Deep Learning at Supercomputer Scale • Deep Learning: Bridging Theory and Practice • Bayesian Deep Learning • Interpreting, Explaining and Visualizing Deep Learning - Now what? 10
  8. ڵຯΛ࣋ͬͨจݙʢNIPS2017ʣ • Train longer, generalize better: closing the generalization gap

    in large batch training of neural networks ൚Խੑೳͱֶश཰ͱόοναΠζͷؔ܎Λߟ࡯ɺGBN ͷఏҊ • Exploring Generalization in Deep Learning ൚ԽੑೳࢦඪΛݕ౼ɺPAC-Bayesian ʹجͮ͘ sharpness ΛఏҊ • Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks SGD Λϥϯδϡόϯํఔࣜͱղऍ͠Թ౓ͷμΠφϛΫεΛղੳ 14
  9. ؔ࿈͢ΔจݙʢNIPS2017Ҏ֎ʣ • A Bayesian Perspective on Generalization and Stochastic Gradient

    Descent "noise scale" ͱֶश཰΍όοναΠζͷؔ܎ࣜΛಋग़ • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ਫ਼౓Λམͱͣ͞ʹେ͖ͳϛχόοναΠζͰͷֶशΛ࣮ݱ • Understanding deep learning requires rethinking generalization ਂ૚ֶशʢ͚ͩʹؔΘΒͣઢܗܥͰ͑͞ʂʣͷ൚ԽੑೳΛཧղ ͢ΔͨΊʹ৽͍͠࿮૊Έ͕ඞཁͰ͋Δ͜ͱΛఏݴ 15
  10. Ҏ߱ͷ࿩ NIPS ࿦จͰ͸ͳ͍͕ɺ͍ۙ಺༰ͰΑΓܥ౷తʹཧղ͕Ͱ͖ΔͨΊ ҎԼͷೋͭͷ࿦จΛઆ໌͢Δ • A Bayesian Perspective on Generalization

    and Stochastic Gradient Descent • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ൚Խੑೳࢦඪʹؔͯ͠͸Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/some-understanding-of-generalization-in-deep-learing 16
  11. Bayesian evidence (Cont'd) ൺֱର৅͸ null model ( : Ϋϥε਺) ͞Βʹ

    Λಋೖͯ͠ ͜ͷ݁Ռ͸Ϟσϧͷ parametrization ʹґΒͣ broad minima ( ͕ খ͍͞) ͕ sharp minima ΑΓ΋Α͘ҰൠԽ͢Δ͜ͱΛࢧ࣋͢Δ 20
  12. ࣮ݧ: logistic regression Ͱͷ bayesian evidence logistic regression Ͱ MNIST

    ͷ {0,1} ൑ผ: 800 train, 10000 test ࠨ͸ random label Ͱӈ͸ਖ਼͍͠ label ҙຯ͋Δ৘ใΛ࣋ͭ label Ͱ͸ ͕ 0 ΛԼճΔ Ref: https://arxiv.org/abs/1710.06451 21
  13. ࣮ݧ: NN Ͱͷ generalization gap 800 hidden units + ReLU

    Ͱ൑ผ: 1000 train, ࢒Γ͸ test SGD w/ momentum 0.9, learning rate 1.0 Λ࢖༻ batch size ʹΑͬͯ൚Խੑೳʹ͕ࠩੜ͡Δ (generalization gap) Ref: https://arxiv.org/abs/1710.06451 22
  14. SGD ʹ͓͚Δ "noise scale" ղੳతͳ࿮૊ΈͰٞ࿦͢ΔͨΊʹ SGD Λ֬཰ඍ෼ํఔࣜͱଊ͑Δ full batch ͱ

    batch ͷ͕ࠩॏཁͰ͋ͬͨ͜ͱʹ஫ҙ͠ɺޯ഑ʹΑΔ ύϥϝλߋ৽ͷࠩ෼ΛҎԼͷܗʹॻ͘ ͜͜Ͱɺ 23
  15. SGD ʹ͓͚Δ "noise scale" (Cont'd) ࿈ଓԽ͞Εͨʢ Λ࿈ଓม਺ͱ͢Δʣ֬཰ඍ෼ํఔࣜͱൺֱ ʢ͍ΘΏΔ overdamped Langevin

    equationʣ ͜͜Ͱɺ ͸ noise ( , ) ͜ͷ ͸ dynamics ͷ༳Β͗Λنఆ͢Δྔ ཭ࢄతͳύϥϝλߋ৽ͷࣜͷ࿈ଓۙࣅΛऔͬͯൺֱΛ͢Δ 25
  16. ਂ૚ֶशͷ൚Խੑೳʹؔ͢Δ·ͱΊ • optimizer ͱͯ͠͸ SGD ͕ཧ࿦తʹ΋࣮ݧతʹ΋ྑͦ͞͏ • ؔ࿈͢Δ༷ʑͳ࿦จ͕ग़͍ͯͯཧղ͕ਐΜͰ͍Δ saddle point

    ʹऩଋ͢Δ֬཰͸θϩ ref ͨͩ͠ saddle point ۙ๣͔Βͷ୤ग़ʹ͸ࢦ਺࣌ؒඞཁ ref • ͳͥਂ૚ֶश͕༏Ε͍ͯΔͷ͔ʁ΋ཧղ͕ਐΜͰ͍͖ͦ͏ ࠷ۙͰ͸ ͜Ε ͱ͔࿩୊ʹͳ͍ͬͯͨ Ref: https://arxiv.org/abs/1602.04915, https://arxiv.org/abs/1705.10412, https://arxiv.org/abs/1802.04474 30
  17. ڵຯΛ࣋ͬͨจݙ ήʔϜཧ࿦తͳղੳ΍ฏߧ఺ۙ๣Ͱͷ gradient flow ͷղੳͳͲ ਖ਼ଇԽ߲ͱͯ͠͸ඍ෼ਖ਼ଇԽ߲͕׆༂ (double back prop.) •

    Gradient descent GAN optimization is locally stable • The Numerics of GANs • Improved Training of Wasserstein GANs • Stabilizing Training of Generative Adversarial Networks through Regularization 33
  18. ڵຯΛ࣋ͬͨจݙʢCont'dʣ • GANs Trained by a Two Time-Scale Update Rule

    Converge to a Local Nash Equilibrium generator ͱ discriminator Ͱಠཱʹֶश཰Λ࣋ͨͤ nash ۉߧ • Approximation and Convergence Properties of Generative Adversarial Learning ΑΓந৅తͳ࿮૊ΈͰ֤छ GAN ΛऔΓѻ͍ɺऩଋՄೳੑ΍֤ ख๏ؒͷ૬ରతͳڧ͞Λࣔͨ͠ 34
  19. Ҏ߱ͷ࿩ جຊతʹҎԼͷ࿦จΛ঺հʢҰ෦ଞͷ࿦จ͔ΒਤΛҾ༻͢Δʣ ଞͷ࿦จͰ΋ຊ࣭తʹ͔ͳΓ͍ۙٞ࿦Λ͍ͯͨ͠Γ͢Δ͕ɺ notation ͸݁ߏόϥόϥͳͷͰཁ஫ҙ • Gradient descent GAN optimization

    is locally stable ଞͷ࿦จͷ͍͔ͭ͘ʹؔͯ͠͸Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/20180127-nips-paper-reading 35
  20. GAN ͷऩଋੑͷݱঢ় • ฏߧ఺ʢnash ۉߧͷҙຯʣ͕ଘࡏ͢Δ͔ඇࣗ໌ zero-sum game ͱ౳Ձͳͷ͸ಛఆͷ৔߹Ͱ͋Δ͜ͱʹ΋஫ҙ • ฏߧ఺͕ଘࡏ͢Δͱͯ͠ऩଋ͢Δͷ͔ඇࣗ໌

    • ࣮༻্͸ֶश͕ෆ҆ఆʢhyperparameterʹහײʣ ͜͜Ͱ͸ฏߧ఺ͷଘࡏ͸Ծఆ͠ɺͦͷۙ๣Ͱऩଋ͢Δ͜ͱΛࣔ͢ 36
  21. GAN ͷֶशͷఆࣜԽ ໨తؔ਺͸ҎԼ ͜͜Ͱ ͸ concave (original GAN ͸ )

    discriminator ͷ஋Ҭ͸ Ͱ͋Δ͜ͱʹ஫ҙ ղੳతͳٞ࿦Λ͢ΔͨΊʹύϥϝλߋ৽ࣜΛ࿈ଓԽͯ͠දݱ 37
  22. GAN ͷֶशͷ҆ఆੑΛࣔͨ͢ΊͷΞϓϩʔν • ඇઢܗྗֶܥ ͕ղੳͷର৅ • gradient flow ͷ҆ఆੑٞ࿦ͷͨΊʹ͸੍ޚܥͷཧ࿦͕༗༻ •

    େҬతʹ͸೉͍͠ͷͰฏߧ఺ۙ๣ʹݶΔ͜ͱͰઢܗԽͯٞ͠࿦ Hartman-Grobman ఆཧ: ૒ۂܕෆಈ఺ͷۙ๣ͰઢܗԽՄೳ • ͷฏߧ఺ͰͷϠίϏΞϯݻ༗஋ͷ࣮෦͕ෛͰ͋Ε͹҆ఆ 41
  23. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ ओͨΔղੳର৅͸ϠίϏΞϯ ͜Εͷݻ༗஋ͷ࣮෦͕ෛʹͳΔ͜ͱΛࣔͤΕ͹Α͍ (1,1) block ͸ Ͱ concave

    ੑʹΑΓ negative definte ͔͜͜Β͸ԾఆΛೖΕΔ͜ͱͰݻ༗஋࣮෦ͷෛੑΛ୲อ͢Δ 43
  24. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) • Ծఆ1: and • Ծఆ2: and

    • Ծఆ3: ͸ Discriminator space Ͱ ͸ G ͰہॴҰఆ • Ծఆ4: s.t. 44
  25. ฏߧ఺ۙ๣Ͱͷ GAN ͷֶश҆ఆੑ (Cont'd) ߦྻ ͸ ͕ negative definite Ͱ

    ͕ full column rank ͳΒݻ༗஋࣮෦ͷෛੑΛࣔͤΔʢ ʹ஫ҙʣ ূ໌͸ݪ࿦จΛࢀরʢݻ༗஋ํఔࣜΛ੔ཧͯࣔ͢͠ʣ GANͷฏߧ఺पΓͷ҆ఆੑ͕ࣔͤͨʢਖ਼֬ʹ͸ࢦ਺తʹऩଋʣ 46
  26. ҆ఆੑΛߴΊΔͨΊͷਖ਼ଇԽ߲ͷߟҊ (Cont'd) (2,2) block ͕ negative definite ͳͷͰ҆ఆੑ͕૿͢ ύϥϝλ ͕খ͚͞Ε͹͜Ε·Ͱͷٞ࿦͕յΕͳ͍͜ͱ΋ࣔͤΔ

    ྨࣅٞ࿦͸ The Numerics of GANs Ͱ΋ͳ͞Ε͍ͯΔ ಋೖ͞Εͨඍ෼ਖ਼ଇԽ߲͸จ຺͸ҟͳΔ͕༷ʑͳ࿦จͰొ৔ 48
  27. ڵຯΛ࣋ͬͨจݙ ৽ͨͳల։ • Dynamic Routing Between Capsules χϡʔϩϯͷvectorҰൠԽͱͦͷؔ܎ੑΛಘΔ࢓૊ΈΛఏҊ • Deep

    Sets ೖྗͱͯ͠ཁૉͷॱ൪ʹґΒͳ͍ू߹Λѻ͑ΔϞσϧΛߏங • Bayesian GAN BayesianͰGANΛऔΓѻ͏͜ͱͰɺ֤छֶशςΫχοΫ͕ෆཁ 54
  28. ڵຯΛ࣋ͬͨจݙ (Cont'd) زԿֶతͳ؍఺͔Βͷൃల • Sobolev Training for Neural Networks ֤૚ͷඍ෼஋΋ֶशʹ࢖༻͢ΔΑ͏ఆࣜԽɺৠཹͳͲͰ࢖͑Δ

    • Principles of Riemannian Geometry in Neural Networks ϦʔϚϯزԿͰఆࣜԽ͠ɺӈLie܈ͷ࡞༻Ͱ back prop. Λදݱ • Riemannian approach to batch normalization BNΛϦʔϚϯزԿͷ࿮૊ΈͰఆࣜԽ 55
  29. ڵຯΛ࣋ͬͨจݙ (Cont'd) ߋͳΔόϥΤςΟ • Attention Is All You Need ࠶ؼߏ଄Λ࢖ΘͣattentionͷΈͰྑ͍݁ՌΛ࣮ݱ

    • Deep Hyperspherical Learning CNNͷ৞ΈࠐΈΛٿ໘্ͷԋࢉͱͯ͠ఆࣜԽɺྑ͍ऩଋੑ • GibbsNet: Iterative Adversarial Inference for Deep Graphical Models ಉ࣌֬཰ ͷϞσϦϯά 56
  30. Ҏ߱ͷ࿩ ݸਓతʹ໘ന͔ͬͨ΋ͷͱͯ͠ҎԼͷೋͭͷ࿦จΛ঺հ • Dynamic Routing Between Capsules (CapsNet) • Deep

    Sets CapsNet ͸ pooling ΛΑΓΑ͍΋ͷ΁ͱվળ͠Α͏ͱ͍͏΋ͷ Deep Sets ͸ू߹Λೖྗͱͯ͠ѻ͑ΔϞσϧΛ࡞Δͱ͍͏΋ͷ 57
  31. CapsNet: Χϓηϧͷೖग़ྗ ग़ྗ (খ͍͞΋ͷΛ௵͢): ೖྗ ( ͸1ͭલͷग़ྗ): ͜͜Ͱ ͸ back

    prop. Ͱֶश͞ΕΔॏΈͰɺ ͸Կ͔͠ΒͰఆΊΔΧϓηϧؒ݁߹ 59
  32. CapsNet: ݁ہΧϓηϧͰԿ͕͔ͨͬͨ͠ͷ͔ʁ • ໰୊ҙࣝ͸ pooling ʹ͓͚Δ৘ใͷ૕ࣦ • ࠷େ஋ͷ routing ͔Β

    entity ͷؔ܎Λߟྀ͢Δ΋ͷ΁֦ு • ϕΫτϧ֦ுʹΑͬͯ޲͖৘ใͰ entity ؒͷ alignment Λදݱ • ͜ΕʹΑͬͯෆมੑͰͳ͘౳ՁੑΛ࣋ͨͤΔ͜ͱΛࢼΈͨ routing algorithm ͸ҰͭͷಛఆͷΞϧΰϦζϜʹ͗͢ͳ͍ Χϓηϧࣗମ͸2011೥ʹ ͜ͷ࿦จ Ͱಋೖ͞Ε͍ͯΔ ͜ͷ࿦จ: http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf 61
  33. CapsNet: multi digit ͷ࠶ߏங࣮ݧ R: reconstruction, L: label, P: prediciton

    ࠨ: ྑ͍ྫ, த: RͰ׶͑ͯҧ͏਺ࣈΛ࢖͏ྫ, ӈ: ༧ଌΛؒҧ͑Δྫ 66
  34. Deep Sets : Ϟσϧ (Cont'd) ஔ׵ಉ૬ͷ NN ͷϞσϧ ͷߏஙํ๏ ͜ͷ

    ͷཁૉ͸ஔ׵ର৅Ͱɺಛ௃ྔ࣍ݩ͕ ͷ৔߹ʢ֦ுՄʣ Ref: https://arxiv.org/abs/1703.06114 70
  35. Deep Sets : ࣮ݧ ໘ന͍ͷ͸৽͍͠λΠϓͷ࣮ݧ͕Ͱ͖Δͱ͍͏఺ • ೖྗσʔλͷ࿨Λܭࢉ • ޫ౓৘ใ͔ΒۜՏͷ੺ํภҠΛࢉग़ •

    gaussian ͔Βੜ੒ͨ͠σʔλͰΤϯτϩϐʔͳͲΛଌΔ • point cloud ͷ෼ྨɺू߹ͷ֦ுɺset anomaly detectionɺ... 71
  36. Deep Sets : set anomaly detection ͷ࣮ݧ CelebAͰ࣮ݧɺtest Ͱ 75%

    ͷਫ਼౓ʢஔ׵ಉ૬૚ͳͩ͠ͱ 6.3%ʣ Ref: https://arxiv.org/abs/1703.06114 74