The evolution of CNN

The evolution of CNN Yohei KIKUTA  [email protected] https://www.linkedin.com/in/yohei-kikuta-983b29117/ 20170629  
Talk about all things Machine Learning

Deep Learning & CNN # of papers
2010 2011 2012 2013 2014 2015 2016 2017 1 1 0 13 74 293 653 476 6 7 22 89 188 651 1,304 1,147 deep learning CNN source: https://arxiv.org/ on 20170626 arXiv papers including “deep learning” (“CNN”) in titles or abstracts in Computer Science 2/17

The evolution of CNN *This is a very limited chronology
Neocognitoron: http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf LeNet-5: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf AlexNet: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks Network in Network: https://arxiv.org/abs/1312.4400 VGG: https://arxiv.org/abs/1409.1556 Inception(V3): https://arxiv.org/abs/1512.00567 ResNet: https://arxiv.org/abs/1512.03385 SqueezeNet: https://arxiv.org/abs/1602.07360 ENet: https://arxiv.org/abs/1606.02147 Deep Complex Networks: https://arxiv.org/abs/1705.09792 1980 1998 2012 2013 2014 2015 2015 2016 2016 2017 3/17

Neocognitron source: www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf • Prototype of CNN • Hierarchical structure
• S-cells (convolution)  feature extraction • C-cells (avg. pooling)  robustness to positional deviation • Self-organizing like training • NOT back propagation 4/17

LeNet-5 source: yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf • Non-linearity sigmoid, tanh 2 8 -2
3 1 7 2 1 -1 2 9 2 -2 0 3 3 4.5 2 -0.5 4.25 • Convolution feature extraction • Subsampling (avg. pooling) positional invariance, size reduction 5/17

AlexNet source: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks • ReLU keep gradient alive • Max
pooling better than average • Dropout generalization • GPU computation accelerate computation 2 8 -2 3 1 7 2 1 -1 2 9 2 -2 0 3 3 8 3 2 9 6/17

Network In Network source: https://arxiv.org/abs/1312.4400 • MLP after convolution efficient
non-linear combinations  of feature maps • Global Average Pooling one feature map for each class  no Fully Connected layers • Small model size 29 [MB] for ImageNet MLP Softmax 7/17

VGG source: https://arxiv.org/abs/1409.1556 • Deep Model with basic building blocks
• convolution • max pooling • activation (ReLU) • Fully Connected • softmax • Sequence of small convolutions • 3*3 spatial convolutions • Relatively large parameters • large channels at the early stages • many Fully Connected layers 8/17

InceptionV3 source: https://arxiv.org/abs/1512.00567 • Inception module • parallel operations and
concatenation  capture different features efficiently • mainly 3*3 convolution  coming from the VGG architecture • 1*1 convolution  reduce the number of channels • Global Average Pooling and Fully Connected • balance accuracy and model size • Good performance! 9/17

ResNet source: https://arxiv.org/abs/1512.03385 • Residual structure • shortcut (by-pass) connection 
keep gradient alive • 1*1 convolution  reduce the number of channels • Very deep model • total 152 layers • more than 1000 layers 10/17

SqeezeNet source: https://arxiv.org/abs/1602.07360 11/17 55*55*96 55*55*16 55*55*64 55*55*128 55*55*64 •
Fire module squeeze channels to reduce   computational costs • Deep compression lighten model size  sparse weight, weight quantization, Huffman coding • Small model for 6 bit data, the model size is 0.47 [MB] ! 1*1 squeeze 1*1 expand 3*3 expand

ENet source: https://arxiv.org/abs/1606.02147 • Realtime segmentation model • downsampling at
the early stages • asymmetric encoder-decoder structure • PReLU • small model ~ 1[MB] • Encoder can be used as CNN • Global Max Pooling encoder decoder input 3 × 512 × 512 12/17

Deep Complex Networks 13/17 source: https://arxiv.org/abs/1705.09792 • Complex structure •
convolution    • batch normalization • Advantages of Complex value • biological & signal processing aspects  can express firing rate & relative timing  detailed description of objects • parameter efficient   2^(depth) efficient than real value

Comparison by acc. vs. G-Ops. 14/17 source: https://arxiv.org/pdf/1605.07678.pdf

Comparison by acc. / M-Params 15/17 source: https://medium.com/towards-data-science/neural-network-architectures-156e5bad51ba

References 16/17

[Review Papers] • On the Origin of Deep Learning:  
https://arxiv.org/abs/1702.07800 • Recent Advances in Convolutional Neural Networks:   https://arxiv.org/abs/1512.07108 • Understanding Convolutional Neural Networks:   http://davidstutz.de/wordpress/wp-content/uploads/2014/07/seminar.pdf • AN ANALYSIS OF DEEP NEURAL NETWORK MODELS FOR PRACTICAL APPLICATIONS  https://arxiv.org/abs/1605.07678 [Slides & Web pages] • Recent Progress on CNNs for Object Detection & Image Compression  https://berkeley-deep-learning.github.io/cs294-131-s17/slides/sukthankar-UC-Berkeley-InvitedTalk-2017-02.pdf • CS231n: Convolutional Neural Networks for Visual Recognition:   http://cs231n.github.io/convolutional-networks/ [Blog posts] • Training ENet on ImageNet:   https://culurciello.github.io/tech/2016/06/20/training-enet.html • Neural Network Architectures:  https://medium.com/towards-data-science/neural-network-architectures-156e5bad51ba 17/17

The evolution of CNN

The evolution of CNN

yoppe

More Decks by yoppe

Other Decks in Science

Featured

Transcript