Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Before we begin: The mathematical building bloc...

tanimutomo
May 17, 2019
350

Before we begin: The mathematical building blocks of neural networks

This slide is the summarization of the book, "Deep Learning with Python - Chapter 2" by Francois Chollet.
Book PDF is available here (http://faculty.neu.edu.cn/yury/AAI/Textbook/Deep%20Learning%20with%20Python.pdf).
And, the Japanese version is "PythonとKerasによるディープラーニング" (https://amzn.to/2oImjwS).
In this slide, I explain the concepts of the neural network, tensor data, and optimization with a lot of figures.

tanimutomo

May 17, 2019
Tweet

Transcript

  1. 6JKU%JCRVGTEQXGTU • A first example of a neural network •

    Tensor and tensor operations • How neural networks learn via backpropagation and gradient descent 2 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.
  2. 6JKU%JCRVGTEQXGTU • A first example of a neural network •

    Tensor and tensor operations • How neural networks learn via backpropagation and gradient descent 3 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.
  3. /0+56&CVCUGV • MNIST (Mixed National Institute of Standards and Technology)

    • Classify grayscale images of handwritten digitis (28 x 28 pixels) into 10 categories ( 0 through 9 ) • 60000 training images + 10000 test images 4 0 2 4 3 28 28 data label
  4. .QCF/0+56&CVCUGV 5 0 : : : : 2 : :

    : : <= Import keras (python library) • The model trained using training dataset and then evaluated by test dataset
  5. 6JG(NQYQH/CEJKPG.GCTPKPI • Training • Test Model Label = 5 1.

    Input 2. Calcurate 3. Output 4. Feedback & Revision 6 … Model Label = 5 Input Calcurate Output 5 Correct! …
  6. 'ZCORNGQH/QFGN • Define the Neural Network Model • Model is

    consisted of many layers which calculate the data • layer extract the representations 10 ← Import models and layers from keras (python library) ← Define like a model ↑ add the calculation methods (
  7. %QPHKIWTCVKQPHQTVTCKPKPIVJGOQFGN • A loss function • An index of how

    much the model can't predict correctl • The model trained for minimizing the loss function • Optimizer • An algorithm that optimize the model • How to feedback to the model • Metrics • Accuracy 11
  8. 6JGVTCKPKPIHNQY Label = 8 Loss function 6 0.01 0.02 0.01

    0.03 0.02 0.03 0.5 0.02 0.3 0.08 Feedback by Optimizer 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01
  9. 6JKU%JCRVGTEQXGTU • A first example of a neural network •

    Tensor and tensor operations • How neural networks learn via backpropagation and gradient descent 16 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.
  10. &CVCTGRTGUGPVCVKQPUHQT00 • All current machine-learning systems use tensors as their

    basic data structure • Tensor is a container of numbers • Tensor is a generalization of matrices to an arbitrary numbers of dimensions. • In tensor, dimension is often called axis 17 0D tensors 1D tensors 2D tensors 3D tensors 4D tensors …
  11. .GVņUVT[PWOR[FCVCQRGTCVKQP • Go to https://paiza.io/en • How to use python

    • Python is the numerical calculation library • Numpy is the tensor operation library of Python • print( … ) : write … in the standard output • How to use Numpy • First, “import numpy as np” • x = np.array([…]) : create the x which is the … array 18
  12. $CVEJ5CORNKPI • The set of the data inputted to the

    model and calculated together 26 Model Labels = [8, 3, 6, 4] 1. Input 2. Calcurate 3. Output 4. Feedback & Revision [6, 2, 3, 4] Batch size (4)
  13. • Create the batch sample using the slicing operation •

    All batch samples is the same size %TGCVGVJGDCVEJ 27
  14. 4GCNYQTNFGZCORNGUQHVGPUQTFCVC • Vector data • 2D tensor of shape (sample,

    features) • Timeseries data or Sequence data • 3D tensor of shape (samples, timesteps, features) • Images • 4D tensor of shape (samples, height, width, channels) • Videos • 5D tensor of shape (samples, frames, h, w, channels) 28
  15. 6KOGUGTKGUFCVCQT5GSWGPEGFCVC • The dataset of stock prices 30 0:00 0:01

    0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 0:00 0:01 0:02 0:03 0:04 0:05 0:06 … 23:59 Max 2 9 7 56 8 6 8 … 4 Min 0 1 5 98 6 4 3 … 9 Now 6 7 9 6 4 67 98 … 1 … Time Date Features
  16. 8KFGQU • Samples x Frames x Height x Width x

    Channles 32 Frames Frames … Samples
  17. 6JKU%JCRVGTEQXGTU • A first example of a neural network •

    Tensor and Tensor operations • How neural networks learn via backpropagation and gradient descent 33 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.
  18. 6GPUQT1RGTCVKQPU Label = 8 Loss function 6 0.01 0.02 0.01

    0.03 0.02 0.03 0.5 0.02 0.3 0.08 Feedback by Optimizer 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01
  19. 6JGFGVCKNQH&GPUGNC[GT • Dense layer is defined the following calculation 36

    … Output is the larger of 0 and (input) * W + b 0.01 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01 …
  20. 'NGOGPVYKUGQRGTCVKQPU • The ReLU operation and addition is the element-wise

    operation • Implementation based on the native Python • Implementation based on the Numpy • These operations is implemented Fortran or C by BLAS in Numpy, which is much faster than the native python 37
  21. $TQCFECUV1RGTCVKQP • Make it possible to add the two tensors

    whose shapes differ • The smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor 38
  22. &QV1RGTCVKQP VGPUQTRTQFWEV • Most common and useful tensor operation 39

    Vector dot product Matrix dot product Numpy Operation Mathematical Operation
  23. 6GPUQTTGUJCRKPI • Convert the shape of the tensor • Often

    use reshaping for “transposition” 41
  24. )GQOGVTKE+PVGTRTGVCVKQPQH&GGR.GCTPKPI • Neural Network is the chains of simple tensor

    operations which are just geometric transformation of input data • What a neural network (or any other machine-learning model) is meant to do is figure out a transformation of the paper ball that would uncrumple it, so as to make the two classes cleanly separable again. • Uncrumpling paper balls is what machine learning is about: finding neat representa- tions for complex, highly folded data manifolds. 45
  25. 6JKU%JCRVGTEQXGTU • A first example of a neural network •

    Tensor and Tensor operations • How neural networks learn via backpropagation and gradient descent 46 Our goal in this chapter will be to build your intuition about these notions without getting overly technical.
  26. 1RVKOK\CVKQP Label = 8 Loss function 6 0.01 0.02 0.01

    0.03 0.02 0.03 0.5 0.02 0.3 0.08 Feedback by Optimizer 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01
  27. 7RFCVGVJGYGKIJVU • Weights matrices are filled with small random values

    • There’s no reason to expect that relu(dot(W, input) + b) when w and b are random, will yield any useful representations • Weights are adjusted gradually to predict accurately from random value 48 weights (trainable parameters) This process is called “training” or “training loop”
  28. 6JGHNQYQHVTCKPKPI • Draw a batch of training samples x and

    corresponding targets y • Run the network on x to obtain predictions y_pred • Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y • Update all weights of the network in a way that slightly reduces the loss on this batch 49
  29. 1XGTXKGYQHVJGQRVKOK\CVKQP Label = 8 Loss function 6 0.01 0.02 0.01

    0.03 0.02 0.03 0.5 0.02 0.3 0.08 Feedback by Optimizer 0.01*3+2 0.01 0.2 0.2*0.01+0.2 0.01
  30. 1XGTXKGYQHVJGQRVKOK\CVKQP Label = 8 6 Loss function: |6 - 8|

    = 2 Measure of mismatch W: 1 => 2 b: 2 => 3 1. Predict (Calculation) 2. Calculate the loss 3. Update the weights
  31. • Update the weights using the “gradient” of the loss

    with regard to the network’s coefficients *QYWRFCVGVJGYGKIJVU 52 Loss function (differentiable)
  32. 1RVKOK\GT • These methods or the plans to update the

    weights • SGD, AdaGrad, RMSProp, … 57
  33. )NQDCNCPF.QECN/KPKOWO • The purpose of the training is “Loss =

    Global minimum” • But, in the case of using Simple optimizer, the model’s weights is converged at “Local minimum” 58
  34. /QOGPVWO • Invention of optimization to avoid converging the “Local

    minimum” • When updating the weights, the optimizer with momentum takes “the update log” into account 59
  35. .QQMDCEMCVQWTHKTUVGZCORNG • Training loop (the configurations for optimizing the model)

    • fit • the method of starting to iterate on the training data in mini-batches of 128 samples, 5 times over • epoch • Each iteration over all training data • After these 5pochs, • The network will have performed 2,345 gradient updates (469 per epoch) • The loss of the network will be sufficiently low that the network will be capable of classifying handwritten digits with high accuracy 63