Deep Networks and Kernel Methods

Deep Networks and Kernel Methods Edgar Marca [email protected] Grupo de
Reconocimiento de Patrones e Inteligencia Artiﬁcial Aplicada — PUC Lima, Perú June 18th, 2015 1 / 35

Table of Contents I Image Classiﬁcation Problem Deep Networks Convolutional
Neural Networks Software How to start Kernel Methods SVM The Kernel Trick History of Kernel Methods Software How to start Kernels and Deep Learning Convolutional Kernel Networks Deep Fried Convnets 2 / 35

Image Classiﬁcation Problem Image Classiﬁcation Problem Figure: http://www.image-net.org/ 3 /
35

Deep Networks

Deep Networks Human-level control through deep reinforcement learning ▶ Volodymyr
Mnih et al., Human-level control through deep reinforcement learning. 5 / 35

Deep Networks Convolutional Neural Networks Convolutional Neural Networks 6 /
35

Deep Networks Software Software ▶ Torch7 — http://torch.ch/. ▶ Caﬀe
— http://caffe.berkeleyvision.org/. ▶ Minerva — https://github.com/dmlc/minerva. ▶ Theano — http://deeplearning.net/software/theano/. 7 / 35

Deep Networks How to start How to start I ▶
Deep Learning Course by Nando de Freitas — https://www.youtube.com/watch?v=PlhFWT7vAEw&list= PLjK8ddCbDMphIMSXn-w1IjyYpHU3DaUYw. ▶ Alex Smola Lecture on Deep Networks — https://www.youtube.com/watch?v=xZzZb7wZ6eE. ▶ Convolutional Neural Networks for Visual Recognition — http://vision.stanford.edu/teaching/cs231n/. ▶ Deep Learning, Spring 2015 — http://cilvr.cs.nyu.edu/doku.php?id=courses: deeplearning2015:start. 8 / 35

Deep Networks How to start How to start II ▶
Deep Learning for Natural Language Processing — http://cs224d.stanford.edu/. ▶ Applied Deep Learning for Computer Vision with Torch – http://torch.ch/docs/cvpr15.html. ▶ DEEP LEARNING, An MIT Press book in preparation — http://www.iro.umontreal.ca/~bengioy/dlbook/. ▶ Reading List — http://deeplearning.net/reading-list/. 9 / 35

Kernel Methods

Kernel Methods SVM Linear Support Vector Machine w, x +
b = 1 w, x + b = −1 w, x + b = 0 margen Figure: Linear Support Vector Machine 11 / 35

Kernel Methods SVM Linear Support Vector Machine Linear SVM -
Primal Problem Given a linear separable training set D = {(x1, y1), (x2, y2), ..., (xl, yl)} ⊂ Rn × {+1, −1}, we can calculate the max margin decision surface ⟨w∗, x⟩ = b∗ solving the convex program (P)        min w,b ϕ(w, b) = 1 2 ⟨w, w⟩ subject to ⟨w, yixi⟩ ≥ 1 + yib, where (xi, yi) ∈ D ⊂ Rn × {−1, +1}. (1) 1. The objective function doesn’t depends on b. 2. The displacement b appears in the restrictions. 3. The number of restrictions is equal to the number of training points. 12 / 35

Kernel Methods SVM Linear Support Vector Machine Lineal SVM -
Dual Problem (DP)                    max α h(α) = max α ( l ∑ i=1 αi − 1 2 l ∑ i=1 l ∑ j=1 αiαjyiyj⟨xi, xj⟩ ) sujeto a l ∑ i=1 αiyi = 0, αi ≥ 0 for i = 1, . . . , l. The calculus of b es in terms of w∗, as following: b+ = min {⟨w∗, x⟩ | (x, y) ∈ D where y = +1)} b− = max {⟨w∗, x⟩ | (x, y) ∈ D where y = −1)} Then b∗ = b++b− 2 The training vectors associated to αi > 0 are named support vectors. 13 / 35

Kernel Methods SVM Linear Support Vector Machine Linear Support Vector
Machine ˆ f(x) = sign ( l ∑ i=1 α∗ i yi⟨xi, x⟩ − b∗ ) 14 / 35

The Kernel Trick

Kernel Methods The Kernel Trick Motivation ▶ How we can
split data that is not linear separable? ▶ How we can utilize algorithms that works for linear separable data that only depends on the inner product? 16 / 35

Kernel Methods The Kernel Trick R to R2 Case How
to separate two classes? 0 R R2 ϕ(x) = (x, x 2) ϕ Figure: Separating the two classes of points by transforming the points into a higher dimensional space where the data is separable. 17 / 35

Kernel Methods The Kernel Trick R2 to R3 Case +
+ + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Figure: Data which is not linear separable. 18 / 35

Kernel Methods The Kernel Trick R2 to R3 Case A
simulation Figure: SVM with polynomial kernel visualization. 19 / 35

Kernel Methods The Kernel Trick Idea ϕ ϕ(+) ϕ(+) ϕ(+)
ϕ(−) ϕ(−) ϕ(−) ϕ(−) ϕ(−) ϕ(+) ϕ(+) Figure: φ is a non-linear mapping from the input space to the feature space. 20 / 35

Kernel Methods The Kernel Trick Non Linear Support Vector Machine
Now we can use a non-linear function φ to map the information from the initial space to a higher dimensional space. Non Linear Support Vector Machine ˆ f(x) = sign ( l ∑ i=1 α∗ i yi⟨φ(xi), φ(x)⟩ − b∗ ) 21 / 35

Kernel Methods The Kernel Trick Deﬁnition 3.1 (Kernel) Let X
a non-empty set. A function k : X × X → K is called kernel in X if and only if there is Hilbert Space H and a mapping Φ : X → H such that for all s, t it holds k(t, s) := ⟨Φ(t), Φ(s)⟩H (2) The function Φ is called feature mapping and H feature space of k. 22 / 35

Kernel Methods The Kernel Trick Example 3.2 Consider X =
R and the function k deﬁned by k(s, t) = st = ⟨ [ s √ 2 s √ 2 ] , [ t √ 2 t √ 2 ]⟩ where the feature mappings are Φ(s) = s and ˜ Φ(s) = [ s √ 2 s √ 2 ] and the features spaces are H = R and ˜ H = R2 respectively. 23 / 35

Kernel Methods The Kernel Trick Non Linear Support Vector Machines
Using the kernel trick we can replace ⟨φ(xi), φ(x)⟩ by a kernel k(xi, x). ˆ f(x) = sign ( l ∑ i=1 α∗ i yik(xi, x) − b∗ ) 24 / 35

History of Kernel Methods

Kernel Methods History of Kernel Methods Timeline Table: Timeline of
Support Vector Machines Algorithm Development 1909 • Mercer Theorem — James Mercer. "Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations". 1950 • "Moore-Aronzajn Theorem" — Nachman Aronszajn. "Reproducing Kernel Hilbert Spaces". 1964 • Introduced the geometrical interpretation of the kernels as inner products in a feature space — Aizerman, Braverman and Rozonoer. "Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning". 1964 • Original SVM algorithm — Vladimir Vapnik and Alexey Chervonenkis. "A Note on One Class of Perceptrons" 26 / 35

Kernel Methods History of Kernel Methods Timeline Table: Timeline of
Support Vector Machines Algorithm Development 1965 • Cover’s Theorem — Thomas Cover. "Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition". 1992 • Support Vector Machines — Bernhard Boser, Isabelle Guyon and Vladimir Vapnik. "A Training Algorithm for Optimal Margin Classiﬁers". 1995 • Soft Support Vector Machines — Corinna Cortes and Vladimir Vapnik. "Support Vector Networks". 27 / 35

Kernel Methods Software Software ▶ LibSVM — https://www.csie.ntu.edu.tw/~cjlin/libsvm/. ▶ SVMLight
— http://svmlight.joachims.org/. ▶ Scikit Learn — http://scikit-learn.org/stable/modules/svm.html. 28 / 35

Kernel Methods How to start How to start ▶ Introduction
to Support Vector Machines — https://beta.oreilly.com/learning/intro-to-svm ▶ Lutz H. Hamel, Knowledge Discovery with Support Vector Machines. ▶ John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis. 29 / 35

Kernel and Deep Learning

Kernels and Deep Learning Kernels and Deep Learning ▶ Julien
Mairal et al., Convolutional Kernel Networks. ▶ Zichao Yang et al., Deep Fried Convnets. 31 / 35

Kernels and Deep Learning Convolutional Kernel Networks Convolutional Kernel Networks
32 / 35

Kernels and Deep Learning Deep Fried Convnets Deep Fried Convnets
▶ Quoc Viet Le et al., Fastfood: Approximate Kernel Expansions in Loglinear Time. 33 / 35

Questions?

Thanks

Deep Networks and Kernel Methods

Deep Networks and Kernel Methods

More Decks by matiskay

Other Decks in Science

Featured

Transcript