Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ゼロから作るDeep Learning読書会 4章 ニューラルネットワークの学習

ゼロから作るDeep Learning読書会 4章 ニューラルネットワークの学習

「4章 ニューラルネットワークの学習」の内容をまとめました

Avatar for Takuya Kitamura

Takuya Kitamura

August 06, 2017
Tweet

More Decks by Takuya Kitamura

Other Decks in Technology

Transcript

  1. ର৅ॻ੶ θϩ͔Β࡞ΔDEEP LEARNING ʙPYTHONͰֶͿσΟʔϓϥʔχϯάͷཧ࿦ͱ࣮૷ʙ ▸ ࡈ౻ ߁ؽ ஶ ▸ O’REILLY

    ΦϥΠϦʔɾδϟύϯ ▸ https://www.oreilly.co.jp/books/9784873117584/ 1ষɹPythonೖ໳ 2ষɹύʔηϓτϩϯ 3ষɹχϡʔϥϧωοτϫʔΫ 4ষɹχϡʔϥϧωοτϫʔΫͷֶश 5ষɹޡࠩٯ఻೻๏ 6ষɹֶशʹؔ͢ΔςΫχοΫ 7ষɹ৞ΈࠐΈχϡʔϥϧωοτϫʔΫ 8ষɹσΟʔϓϥʔχϯά ⾢ࠓճ͸ίίʂ
  2. 4.1 σʔλ͔Βֶश͢Δ ▸ χϡʔϥϧωοτϫʔΫ͸σʔλ͔ΒֶशͰ͖Δ ▸ ॏΈύϥϝʔλͷ஋Λσʔλ͔ΒࣗಈͰܾఆͰ͖Δ ▸ ը૾σʔλΛղੳ͢Δ৔߹… ▸ ಛ௃ྔΛநग़ɹˠɹಛ௃ྔͷύλʔϯΛֶश

    ▸ ಛ௃ྔɿೖྗσʔλ(ೖྗը૾)͔Βຊ࣭తͳσʔλ(ॏཁͳσʔλ)Λత֬ʹநग़Ͱ͖ΔΑ͏ʹઃܭ͞Εͨ ม׵ث ▸ ܇࿅σʔλʢڭࢣσʔλʣͱςετσʔλ͕ඞཁ ▸ ൚༻తͳ໰୊Λղͨ͘Ίʹֶश໨తͷσʔλͱͦΕΛධՁ͢ΔͨΊͷσʔλΛผʑʹ༻ҙ͢Δඞཁ͕͋Δ ▸ ಛఆͷσʔληοτ͚ͩΛա౓ʹֶशͨ͠ঢ়ଶΛաֶशʢoverfittingʣͱݺͿ Ұൠతͳܭࢉ ػցֶश σΟʔϓϥʔχϯά ը૾ ը૾ ը૾ ਓͷߟ͑ͨΞϧΰϦζϜ ਓͷߟ͑ͨಛ௃ྔ ʢSIFTɺHOGͳͲʣ ػցֶश
 ʢSVNɺKNNͳͲʣ χϡʔϥϧωοτϫʔΫ ʢEND-TO-END MACHINE LEARNINGʣ ౴͑ ౴͑ ౴͑
  3. ▸ ୅දతͳଛࣦؔ਺ ▸ 2৐࿨ޡࠩ ▸ ަࠩΤϯτϩϐʔޡࠩ ଛࣦؔ਺ɿֶशͷਫ਼౓ΛଌΔࢦඪɻ
 ɹɹɹɹɹؔ਺͔ΒಘΒΕΔ݁Ռ͕খ͍͞΄Ͳਫ਼౓͕ߴ͍ɻ
 4.2 ଛࣦؔ਺

    def mean_squared_error(y, t): return 0.5 * np.sum((y-t)**2) def cross_entropy_error(y, t): delta = 1e-7 return -np.sum(t * np.log(y + delta)) ※y͕χϡʔϥϧωοτϫʔΫͷग़ྗ ※t͕ڭࢣσʔλ ▸ ֶशͷਫ਼౓ࢦඪʹೝࣝਫ਼౓Ͱ͸ͳ͘ଛࣦؔ਺Λ࢖͏ཧ༝ ▸ ֶश͸ଛࣦؔ਺͕Ͱ͖Δ͚ͩখ͘͞ͳΔΑ͏ͳύϥϝʔλʢॏΈ΍όΠΞεʣΛ୳͢࡞ۀ ▸ ਫ਼౓Λ্͛ΔύϥϝʔλΛ୳͢ख͕͔Γͱͯ͠ɺύϥϝʔλΛมԽͤͨ࣌͞ͷଛࣦؔ਺ͷ܏͖ʢඍ෼ʣΛ࢖͏ ▸ ྫɿ͋ΔॏΈύϥϝʔλͷ஋Λগ ͚ͩ͠มԽͤͨ͞ͱ͖ʹɺଛࣦؔ਺͕ͲͷΑ͏ʹมԽ͢Δ͔ ▸ ೝࣝਫ਼౓Λࢦඪʹ͢Δͱɺඍ෼ͨ݁͠Ռ͕΄ͱΜͲͷ৔ॴͰ 0 ʹͳͬͯ͠·͏͔ΒͰ͋Δɻ ଛࣦؔ਺ ೝࣝਫ਼౓
  4. 4.2 ଛࣦؔ਺ɹʙϛχόονֶशʙ ▸ ϛχόονֶश ▸ ֶशͱ͸܇࿅σʔλʹର͢Δଛࣦؔ ਺ΛٻΊɺͦͷ஋ΛͰ͖Δ͚ͩখ͞ ͘͢ΔΑ͏ͳύϥϝʔλΛ୳͠ग़͢ ͜ͱ ▸

    ܇࿅σʔλͷ਺͕ෳ਺͋Δ৔߹͸ɺ ͦͷଛࣦؔ਺ͷฏۉ஋Ͱਫ਼౓ධՁ͢ Δ ▸ ϛχόονֶश๏ͱ͸ɺ͢΂ͯͷ܇ ࿅σʔλʹରͯ͠ܭࢉΛߦ͏ͱ࣌ؒ ͕͔͔ΔͷͰɺσʔλͷத͔ΒҰ෦ ΛαϯϓϦϯάͯͦ͠ͷσʔλΛશ ମͷʮۙࣅʯͱͯ͠ධՁ͢Δํ๏ def cross_entropy_error(y, t): if y.ndim == 1: t = t.reshape(1, t.size) y = y.reshape(1, y.size) batch_size = y.shape[0] return -np.sum(t * np.log(y)) / batch_size # one hot͡Όͳ͍৔߹ͷࣜ # return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True) network = init_network() train_size = x_train.shape[0] batch_size = 10 # 0͔Βtrain_sizeະຬͷ਺ࣈͷத͔ΒϥϯμϜʹbatch_sizeݸͷ਺ࣈΛબͿ batch_mask = np.random.choice(train_size, batch_size) x_batch = x_train[batch_mask] t_batch = t_train[batch_mask] y_batch = predict(network, x_batch) r = cross_entropy_error(y_batch, t_batch)
  5. 4.3 ਺஋ඍ෼ ▸ ඍ෼ͱ͸ ▸ ʮ͋ΔॠؒʯͷมԽͷྔΛදͨ͠΋ͷ ▸ มԽྔΛଌΔظؒΛݶΓͳ͘খ͘͢͞ΔࣄͰɺ͋ΔॠؒͷมԽྔʢ܏͖ɾޯ഑ʣΛಘΔ͜ͱ ▸ ਺஋ඍ෼ͱ͸

    ▸ ඍখͳࠩ෼ʹΑͬͯඍ෼ΛٻΊΔ͜ͱ ▸ ܭࢉػΛ࢖࣮ͬͯ਺஋͔Βඍ෼ͷۙࣅ஋Λग़͢৔߹ʹ࢖͏ ▸ ղੳతͳඍ෼͸ֶߍͰͳΒ͏Α͏ͳ਺ࣜతʹඍ෼ΛٻΊΔ͜ͱ ▸ ྫɿɹɹɹɹͷඍ෼͸ɹɹɹɹɹɹɺΑͬͯx=2ͷ࣌ͷyͷඍ෼͸4 ▸ ਺஋ඍ෼͸ղੳతͳඍ෼ʹൺ΂ͯޡ͕ࠩੜ͡Δ ▸ hΛݶΓͳ͘খ͍ͯ͘͞͠Δ΋ͷͷɺ׬શʹ͸0ʹ͸Ͱ͖͍ͯͳ͍ͨΊ ▸ ͜ͷޡࠩΛݮΒ͢޻෉ͱͯ͠ɺ(x + h) ͱ (x − h) Ͱͷؔ਺ f ͷࠩ෼Λܭࢉ͢Δ͜ͱͰɺޡࠩΛݮΒ͢ʢத৺ࠩ෼ʣ ▸ த৺ࠩ෼ʹΑΔ਺஋ඍ෼ؔ਺ͷ࣮૷ def numerical_diff(f, x): h = 1e-4 # 0.0001 return (f(x+h) - f(x-h)) / (2*h)
  6. 4.3 ਺஋ඍ෼ɹʙ਺஋ඍ෼ͷ࣮૷ʙ ▸ ਺஋ඍ෼ͷ࣮૷ྫ import numpy as np import matplotlib.pylab

    as plt # 0.01x^2 + 0.1xͷ਺ࣜ def function_1(x): return 0.01*x**2 + 0.1*x def numerical_diff(f, x): h = 1e-4 # 0.0001 return (f(x+h) - f(x-h)) / (2*h) # 0͔Β20·Ͱɺ0.1ࠁΈͷx഑ྻ x = np.arange(0.0, 20.0, 0.1) # x=5ͷ৔߹ͷ਺஋ඍ෼ d = numerical_diff(function_1, 5) print(d) # -- ҎԼάϥϑඳըॲཧ -- # 0.01x^2 + 0.1xͷ਺ࣜͷάϥϑඳը y = function_1(x) plt.plot(x, y) # ઀ઢΛҾͨ͘Ίͷ਺஋ΛٻΊΔؔ਺ def tangent_line(f, x, d): y = f(x) - d*x return lambda t: d*t + y # ඍ෼݁Ռͷ઀ઢͷάϥϑඳը y2 = tangent_line(function_1, 5, d)(x) plt.plot(x, y2) plt.xlabel("x") plt.ylabel("f(x)") plt.show()
  7. 4.3 ਺஋ඍ෼ɹʙภඍ෼ʙ ▸ ภඍ෼ ▸ ෳ਺ͷม਺͔ΒͳΔؔ਺ͷඍ෼ͷ͜ͱ ▸ ྫɿɹɹɹɹɹɹɹʹର͢Δɺx0ͷඍ෼Λɹɹɺx1ͷඍ෼Λɹɹͱදݱ ▸ ภඍ෼Ͱ͸ɺͦΕͧΕͷม਺ʹର͢Δ܏͖ΛٻΊΔ

    ▸ ඍ෼ΛٻΊΔม਺Ҏ֎͸ಛఆͷ஋ʹݻఆͨؔ͠਺ͱͯ͠࠶ఆٛ͠ɺ࠶ఆٛ ͨؔ͠਺ʹରͯ͠਺஋෦෼Λߦ͏ ▸ x0=3ɺx1=4ͷͱ͖ͷx0ʹର͢Δภඍ෼ ▸ x0=3ɺx1=4ͷͱ͖ͷx1ʹର͢Δภඍ෼ def function_tmp1(x0): return x0*x0 + 4.0**2.0 d0 = numerical_diff(function_tmp1, 3.0) def function_tmp2(x1): return 3.0**2.0 + x1*x1 d1 = numerical_diff(function_tmp2, 4.0)
  8. 4.4 ޯ഑ɹʙޯ഑๏ʙ ޯ഑๏ɿ࠷దԽ໰୊ʹ͓͍ͯɺؔ਺ͷޯ഑ʹؔ͢Δ৘ใΛղͷ୳ࡧʹ༻͍ΔΞϧΰϦζϜͷ૯শɻ ▸ ࠷దղΛٻΊͯޯ഑ํ޲΁ਐΉ͜ͱΛ܁Γฦ͢͜ͱͰɺؔ਺ͷ஋ΛঃʑʹݮΒ͢ͷ͕ޯ഑๏(gradient method) ▸ ݱࡏͷ৔ॴ͔Βޯ഑ํ޲ʹҰఆͷڑ཭͚ͩਐΉɻ
 ͦͯ͠Ҡಈͨ͠ઌͰ΋ಉ༷ʹޯ഑ΛٻΊɺ·ͨɺͦͷޯ഑ํ޲΁ਐΉ
 ͜ΕΛ܁Γฦ͠ޯ഑ํ޲΁Ҡಈ͢Δɻ

    ▸ ɹɹɹɹɹɹɹɹɹɹʹର͢Δޯ഑๏Λ਺ࣜͰද͢ͱ ▸ ɹɹɹɹɹɹɹɹɺ ▸ η͸ߋ৽ྔɻχϡʔϥϧωοτϫʔΫʹ͓͍ͯ͸ֶश཰Λද͢ ▸ χϡʔϥϧωοτϫʔΫͷ৔߹… ▸ ଛࣦؔ਺͕࠷খ஋ΛऔΔͱ͖ͷύϥϝʔλ஋ʢॏΈͱόΠΞεʣΛޯ഑Λ͏·͘ར༻ͯ͠୳͢ ▸ ஫ҙʂ ▸ ֤஍఺ʹ͓͍ͯؔ਺ͷ஋Λ࠷΋ݮΒ͢ํ޲Λࣔ͢ͷ͕ޯ഑Ͱ͋Δ ▸ ޯ഑͕ࢦ͢ઌ͕ຊ౰ʹؔ਺ͷ࠷খ஋ͳͷ͔Ͳ͏͔ɺ·ͨɺͦͷઌ͕ຊ౰ʹਐΉ΂͖ํ޲ͳͷ͔Ͳ͏͔อ ূ͞Εͳ͍
  9. 4.4 ޯ഑ɹʙޯ഑๏ͷ࣮૷ʙ import numpy as np def numerial_gradient(f, x): h

    = 1e-4 # 0.0001 grad = np.zeros_like(x) # xͱಉ͡ܗঢ়ͷ഑ྻΛੜ੒ for idx in range(x.size): tmp_val = x[idx] # f(x+h)ͷܭࢉ x[idx] = tmp_val + h fxh1 = f(x) # f(x-h)ͷܭࢉ x[idx] = tmp_val - h fxh2 = f(x) grad[idx] = (fxh1 - fxh2) / (2*h) x[idx] = tmp_val # ஋Λݩʹ໭Δ return grad def gradient_descent(f, init_x, lr=0.01, step_num=100): x = init_x for i in range(step_num): grad = numerial_gradient(f, x) x -= lr * grad return x def function_2(x): return x[0]**2 + x[1]**2 init_x = np.array([-3.0, 4.0]) mx = gradient_descent(function_2, init_x=init_x, lr=0.1, step_num=100) print(mx)
  10. 4.4 ޯ഑ɹʙχϡʔϥϧωοτϫʔΫʹର͢Δޯ഑ʙ ▸ χϡʔϥϧωοτϫʔΫͷޯ഑͸ ॏΈύϥϝʔλʹؔ͢Δଛࣦؔ਺ ͷޯ഑ ▸ 1૚ͷΈͷχϡʔϥϧωοτϫʔ Ϋͷޯ഑ܭࢉͷ࣮૷ྫ ▸

    ࠨهࢀর import sys, os sys.path.append(os.pardir) import numpy as np from common.functions import softmax, cross_entropy_error from common.gradient import numerical_gradient # 1૚ͷχϡʔϥϧωοτϫʔΫܭࢉΫϥε class simpleNet: def __init__(self): self.W = np.random.randn(2, 3) # Ψ΢ε෼෍ͰॳظԽ # 1૚෼ͷ༧ଌॲཧ def predict(self, x): return np.dot(x, self.W) # ଛࣦؔ਺ͷ஋ΛٻΊΔॲཧ def loss(self, x, t): z = self.predict(x) y = softmax(z) loss = cross_entropy_error(y, t) return loss # ΠϯελϯεΛੜ੒ net = simpleNet() # xʹ[0.6, 0.9]ͷߦྻΛ౉ͯ͠༧ଌΛ࣮ߦ x = np.array([0.6, 0.9]) p = net.predict(x) np.argmax(p) # ࠷େ஋ͷΠϯσοΫε # ਖ਼ղϥϕϧΛ౉ͯ͠ଛࣦؔ਺Λܭࢉ t = np.array([0, 0, 1]) # ਖ਼ղϥϕϧ loss = net.loss(x, t) # ޯ഑Λܭࢉ f = lambda w: net.loss(x, t) # f͸numerical_gradientؔ਺ͳ͍Ͱf͸Ҿ਺̍ͭͷf(x)Ͱදݱ͢ΔͨΊɺ # ͦΕʹ߹Θͤͨܗʹ߹ΘͤΔͨΊ͚ͩͷ΋ͷ dW = numerical_gradient(f, net.W)
  11. 4.5 ֶशΞϧΰϦζϜͷ࣮૷ ▸ χϡʔϥϧωοτϫʔΫͷֶशεςοϓʢ֬཰తޯ഑߱ Լ๏ - SGTʣ ▸ εςοϓ1ʢϛχόονʣ ▸

    ܇࿅σʔλͷத͔ΒϥϯμϜʹҰ෦ͷσʔλΛબͼग़͢ ▸ εςοϓ2ʢޯ഑ͷࢉग़ʣ ▸ ϛχόονͷଛࣦؔ਺ΛݮΒͨ͢Ίʹɺ֤ॏΈύϥϝʔλͷޯ഑ΛٻΊΔ ▸ εςοϓ3ʢύϥϝʔλͷߋ৽ʣ ▸ ॏΈύϥϝʔλΛޯ഑ํ޲ʹඍখྔ͚ͩߋ৽͢Δ ▸ εςοϓ4ʢ܁Γฦ͢ʣ ▸ εςοϓ1ɺεςοϓ2ɺεςοϓ3 Λ܁Γฦ͢
  12. 4.5 ֶशΞϧΰϦζϜͷ࣮૷ɹʙ̎૚χϡʔϥϧωοτϫʔΫΫϥεͷ࣮૷ʙ import sys, os import numpy as np sys.path.append(os.pardir)

    from common.functions import * from common.gradient import numerical_gradient class TwoLayerNet: # input_size:ೖྗ૚ͷχϡʔϩϯͷ਺ɺ # hidden_size:ӅΕ૚ͷχϡʔϩϯͷ਺ɺ # output_size:ग़ྗ૚ͷχϡʔϩϯͷ਺ def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01): # ॏΈͷॳظԽ # χϡʔϥϧωοτϫʔΫͷύϥϝʔλΛอ࣋͢ΔσΟΫγϣφϦม਺ self.params = {} # ॏΈ͸Ψ΢ε෼෍ʹै͏ཚ਺ͰॳظԽɺόΠΞε͸0ͰॳظԽ self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size) self.params['b1'] = np.zeros(hidden_size) self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) self.params['b2'] = np.zeros(output_size) # ೝࣝʢਪ࿦ʣΛ࣮ߦ͢Δॲཧ def predict(self, x): # x:ը૾σʔλ W1, W2 = self.params['W1'], self.params['W2'] b1, b2 = self.params['b1'], self.params['b2'] a1 = np.dot(x, W1) + b1 z1 = sigmoid(a1) # γάϞΠυؔ਺Λਪ࿦ʹద༻ a2 = np.dot(z1, W2) + b2 return softmax(a2) # ιϑτϚοΫεؔ਺Λग़ྗ૚ʹద༻ (ଓ͘) (ଓ͖) # ଛࣦؔ਺ΛٻΊΔॲཧ def loss(self, x, t): # x:ը૾σʔλɺt:ڭࢣσʔλ y = self.predict(x) # ަࠩΤϯτϩϐʔޡࠩΛଛࣦؔ਺ʹద༻ return cross_entropy_error(y, t) # ೝࣝਫ਼౓ΛٻΊΔॲཧ def accuracy(self, x, t): y = self.predict(x) y = np.argmax(y, axis=1) t = np.argmax(t, axis=1) accuracy = np.sum(y == t) / float(x.shape[0]) return accuracy # ॏΈύϥϝʔλʹର͢Δޯ഑ΛٻΊΔ def numerical_gradient(self, x, t): # x:ը૾σʔλɺt:ڭࢣσʔλ loss_W = lambda W: self.loss(x, t) grads = {} # ޯ഑Λอ࣋͢ΔσΟΫγϣφϦม਺ grads['W1'] = numerical_gradient(loss_W, self.params['W1']) grads['b1'] = numerical_gradient(loss_W, self.params['b1']) grads['W2'] = numerical_gradient(loss_W, self.params['W2']) grads['b2'] = numerical_gradient(loss_W, self.params['b2']) return grads
  13. 4.5 ֶशΞϧΰϦζϜͷ࣮૷ɹʙϛχόονֶशͷ࣮૷ʙ import sys, os sys.path.append(os.pardir) import numpy as np

    from dataset.mnist import load_mnist from two_layer_net import TwoLayerNet # MNISTσʔλͷऔಘ (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True) # ϋΠύʔύϥϝʔλ iters_num = 10000 # ޯ഑๏ʹΑΔߋ৽ͷճ਺ train_size = x_train.shape[0] batch_size = 100 # ϛχόονͷαΠζ learning_rate = 0.1 network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10) for i in range(iters_num): # όοναΠζΛ100ͱͯ͠ɺຖճ60,000ݸͷ܇࿅σʔλ͔Β # ϥϯμϜʹ100ݸͷσʔλʢը૾σʔλͱਖ਼ղϥϕϧσʔλ)Λൈ͖ग़͢ batch_mask = np.random.choice(train_size, batch_size) x_batch = x_train[batch_mask] t_batch = t_train[batch_mask] # 100 ݸͷϛχόονΛର৅ʹޯ഑ΛٻΊΔ grad = network.numerical_gradient(x_batch, t_batch) # ޯ഑๏ʹΑΔߋ৽ͷճ਺Λ10,000ճͱͯ͠ɺ # ߋ৽͢Δ͝ͱʹ܇࿅σʔλʹର͢Δଛࣦؔ਺Λܭࢉ͠ɺͦͷ஋Λ഑ྻʹ௥Ճ for key in ('W1', 'b1', 'W2', 'b2'): network.params[key] -= learning_rate * grad[key]
  14. 4.5 ֶशΞϧΰϦζϜͷ࣮૷ɹʙςετσʔλͰධՁʙ ▸ ܇࿅σʔλ͚ͩͷਫ਼౓ଌఆͰ͸൚༻Խʹ͔͚Δ ▸ ܇࿅σʔλҎ֎ͷσʔλͰ΋ਖ਼͘͠ೝࣝͰ͖ΔΑ ͏ʹςετσʔλͰ΋ධՁ͢Δ ▸ աֶश͕ى͖͍ͯͳ͍͔ͷ֬ೝ ▸

    ΤϙοΫ ▸ ΤϙοΫʹ܇࿅σʔλΛ͢΂ͯ࢖͍͖ͬͨ࣌ͷ ճ਺ ▸ ςετσʔλͰͷධՁ ▸ 1ΤϙοΫ͝ͱʹɺ͢΂ͯͷ܇࿅σʔλͱςε τσʔλʹରͯ͠ೝࣝਫ਼౓Λܭࢉͯ͠ɺͦͷ݁ ՌΛه࿥͢Δ֬͠ೝ͢Δ ▸ ࠨهͷ࣮૷ྫΛࢀর ... # ϋΠύʔύϥϝʔλ iters_num = 10000 # ޯ഑๏ʹΑΔߋ৽ͷճ਺ train_size = x_train.shape[0] batch_size = 100 # ϛχόονͷαΠζ learning_rate = 0.1 train_acc_list = [] # ܇࿅σʔλͷೝࣝਫ਼౓ͷه࿥ test_acc_list = [] # ςετσʔλͷೝࣝਫ਼౓ͷه࿥ # 1ΤϙοΫ͋ͨΓͷ܁Γฦ͠਺ iter_par_epoch = max(train_size / batch_size, 1) network = TwoLayerNet(input_size=784, ɹɹɹɹɹɹɹɹɹɹɹɹɹhidden_size=50, output_size=10) for i in range(iters_num): ... # ޯ഑๏ʹΑΔߋ৽ͷճ਺Λ10,000ճͱͯ͠ɺ # ߋ৽͢Δ͝ͱʹ܇࿅σʔλʹର͢Δଛࣦؔ਺Λܭࢉ͠ɺ
 # ͦͷ஋Λ഑ྻʹ௥Ճ for key in ('W1', 'b1', 'W2', 'b2'): network.params[key] -= learning_rate * grad[key] # 1ΤϙοΫ͝ͱʹೝࣝਫ਼౓Λܭࢉ if i % iter_par_epoch == 0: # ܇࿅σʔλͷೝࣝਫ਼౓Λܭࢉ train_acc = network.accuracy(x_train, t_train) # ςετσʔλͷೝࣝਫ਼౓Λܭࢉ test_acc = network.accuracy(x_test, t_test) train_acc_list.append(train_acc) test_acc_list.append(test_acc) print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))