TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning • Train longer, generalize better: closing the generalization gap in large batch training of neural networks • End-to-End Differentiable Proving • Gradient descent GAN optimization is locally stable 7
• Imagination-Augmented Agents for Deep Reinforcement Learning • Masked Autoregressive Flow for Density Estimation • Deep Sets • From Bayesian Sparsity to Gated Recurrent Nets 8
Learning in λΠτϧʣ • Deep Learning for Physical Sciences • Deep Learning at Supercomputer Scale • Deep Learning: Bridging Theory and Practice • Bayesian Deep Learning • Interpreting, Explaining and Visualizing Deep Learning - Now what? 10
in large batch training of neural networks ൚ԽੑೳͱֶशͱόοναΠζͷؔΛߟɺGBN ͷఏҊ • Exploring Generalization in Deep Learning ൚ԽੑೳࢦඪΛݕ౼ɺPAC-Bayesian ʹجͮ͘ sharpness ΛఏҊ • Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks SGD Λϥϯδϡόϯํఔࣜͱղऍ͠ԹͷμΠφϛΫεΛղੳ 14
and Stochastic Gradient Descent • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour ൚Խੑೳࢦඪʹؔͯ͠Ҏલʹ ͜Μͳൃද Λͯ͠·͢ ͜Μͳൃද: https://speakerdeck.com/diracdiego/some-understanding-of-generalization-in-deep-learing 16
Gradient descent GAN optimization is locally stable • The Numerics of GANs • Improved Training of Wasserstein GANs • Stabilizing Training of Generative Adversarial Networks through Regularization 33
Converge to a Local Nash Equilibrium generator ͱ discriminator ͰಠཱʹֶशΛ࣋ͨͤ nash ۉߧ • Approximation and Convergence Properties of Generative Adversarial Learning ΑΓநతͳΈͰ֤छ GAN ΛऔΓѻ͍ɺऩଋՄೳੑ֤ ख๏ؒͷ૬ରతͳڧ͞Λࣔͨ͠ 34
• Principles of Riemannian Geometry in Neural Networks ϦʔϚϯزԿͰఆࣜԽ͠ɺӈLie܈ͷ࡞༻Ͱ back prop. Λදݱ • Riemannian approach to batch normalization BNΛϦʔϚϯزԿͷΈͰఆࣜԽ 55