Machine Learning with Python

Machine Learning with Python

G's Academy Tokyo にて行なっているPython講義の3日目の資料です。3日目はPythonによる機械学習を学びます。SVMアルゴリズムを用いて、手書き文字を判定するアプリを作る演習付きです。
Yohei Munesada

February 22, 2017

  machine learning is … ? 大量データからパターンを認識する 明日が晴れる条件とは？ イケメンに認定されるパターンとは？ 明日のディズニーランドが混む条件は？

    パターンを使って、未知のデータを予測する あの動き、もしや不審者では...？ 今週末のお店への来客数は？ 来月の電力使用量は？
  機械学習のモデルは、回帰 (Regression) と分類 (Classification) に大別されます 回帰(Regression) 分類(Classification) 入力されたデータから数値を予測するモデル。 例：ユーザーの購入額予測、ユーザーのスマホ利用時間予測

    入力されたデータから分類を予測するモデル。分類器 (Classifier) とも呼ばれる。 例：ユーザーが購入するか否か、画像に猫が含まれるか否か、手書き数値の値は何か？ regression and classification
  機械学習のモデルは、教師あり (Supervised) と教師なし (Unsupervised) に大別されます 教師あり(Supervised) 教師なし(Unsupervised) 事前に与えられたデータ (トレーニングデータ)

    を使って学習を行い、それをもとに予測する。 例：線形回帰、ロジスティック回帰、SVM、ニューラルネット、決定木、etc 事前データなしに、与えられた未知なデータから何らかの本質的な構造を導き出す。 例：クラスタリング、主成分分析、etc supervised and unsupervised
  教師あり Supervised 教師なし Unsupervised 回帰 Regression A B 分類

    Classification C D 4 types of machine learning 他にも強化学習や遺伝的アルゴリズムも
  6.  NumPy ߦྻ΍ϕΫτϧΛѻ͏͜ͱ͕Ͱ͖ɺ$ݴޠϨϕϧͰߴ଎ʹॲཧ͢Δ import numpy as np X = np.array([

    [5,0,1], [1,6,0], [0,0,1] ]) X.shape X.T E = np.eye(X.shape[0]) X.dot(E)
  7.  Matplotlib %ϓϩοτΛߦ͏ϥΠϒϥϦͰ͢ import numpy as np import matplotlib.pylab as

    plt x = np.linspace(-np.pi, np.pi, 200) plt.plot(x, np.sin(x)) plt.show()
    re: 4 types of machine learning
    re: 4 types of machine learning
  10.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  11.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  12.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋
  13.  Support Vector Machine ڭࢣ͋Γɺ෼ྨ໰୊Λղ͘ΞϧΰϦζϜͰ͢ ˚ ˚ ˚ ˚ ˚

    ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ ̋ Ϛʔδϯͷ࠷େԽ
  14.  the features of MNIST ./*45ͷը૾ͷಛ௃ྔͱ͸ԿͰ͠ΐ͏͔ʁ    

                                                                                                                                                          ʜ ϐΫηϧ͕ͦΕͧΕಛ௃ྔʢίʣ
  15.  binary data with Python όΠφϦσʔλΛTUSVDUϞδϡʔϧ ඪ४ϥΠϒϥϦ Ͱѻ͍·͢ import struct

    f = open(“file.binary", "rb") num1, num2 = struct.unpack(">II", f.read(8))
  16.  import struct import gzip # Read MNIST `label`. fpath

    = "./data_mnist/train-labels-idx1-ubyte.gz" with gzip.open(fpath, "rb") as f: magic_number, img_count = struct.unpack(">II", f.read(8)) labels = [] for i in range(img_count): label = str(struct.unpack("B", f.read(1))[0]) labels.append(label) # Write as csv. outpath = './csv/train-labels.csv' with open(outpath, "w") as f: f.write("\n".join(labels)) ˞ϥϕϧσʔλ
  17.  ˞ը૾σʔλ import struct import gzip # Read MNIST `images`.

    fpath = "./data_mnist/train-images-idx3-ubyte.gz" with gzip.open(fpath, "rb") as f: _, img_count = struct.unpack(">II", f.read(8)) rows, cols = struct.unpack(">II", f.read(8)) images = [] for i in range(img_count): binary = f.read(rows * cols) images.append(",".join([str(b) for b in binary])) # Write as csv. outpath = './csv/train-images.csv' with open(outpath, "w") as f: f.write("\n".join(images))
  18.  output as images Ұ෦Λը૾ʹग़ྗͯ͠$47݁Ռ͕ਖ਼͍͜͠ͱΛ֬ೝ͠·͢ with open("./csv/train-images.csv") as f: images

    = f.read().split("\n") for i, image in enumerate(images[:10]): with open("./image/%d.pgm" % i, "w") as f: s = "P2 28 28 255\n" s += " ".join(image.split(",")) f.write(s)
  19.  use Support Vector Machine ͦΕͰ͸47.ͰύλʔϯೝࣝΛߦ͍͍ͨͱࢥ͍·͢ from sklearn import svm

    # Load training data. with open("./csv/train-images.csv") as f: images = f.read().split("\n")[:500] with open("./csv/train-labels.csv") as f: labels = f.read().split("\n")[:500] # Convert data. images = [[int(i)/256 for i in image.split(",")] for image in images] labels = [int(l) for l in labels] # Use SVM. clf = svm.SVC() clf.fit(images, labels)
  20.  evaluate the model ςετσʔλΛ༻ֶ͍ͯशͨ݁͠Ռ͔Β༧ଌΛߦ͍ɺਫ਼౓ΛධՁ͠·͢ from sklearn import metrics test_images

    = #ಡΈࠐΈॲཧ͸লུʢࠓճ͸500݅ಡΈࠐΉʣ# test_labels = #ಡΈࠐΈॲཧ͸লུʢࠓճ͸500݅ಡΈࠐΉʣ# # Predict. predict = clf.predict(test_images) # Show results. ac_score = metrics.accuracy_score(test_labels, predict) print("Accuracy:", ac_score) cl_report = metrics.classification_report(test_labels, predict) print(cl_report)
  21.  ֶशσʔλΛˠ ݅ʹ૿΍͠·͢ from sklearn import svm # Load training

    data. with open("./csv/train-images.csv") as f: images = f.read().split("\n")[:5000] with open("./csv/train-labels.csv") as f: labels = f.read().split("\n")[:5000] # Convert data. images = [[int(i)/256 for i in image.split(",")] for image in images] labels = [int(l) for l in labels] # Use SVM. clf = svm.SVC() clf.fit(images, labels) how to improve the model
  22.  ֶश݁ՌΛอଘ͠ɺͷͪʹ8FCαʔόʔ͔Βར༻Ͱ͖ΔΑ͏ʹ͠·͢ save the training result # Save the training

    result. from sklearn.externals import joblib joblib.dump(clf, "./result/svm.pkl") result svm.pkl
  23.  ઌ΄Ͳอଘֶͨ͠श݁ՌΛɺผͷϓϩάϥϜ͔Βར༻͠·͢ use the model from sklearn.externals import joblib #

    Load the training result. pklfile = path.join("result", "svm.pkl") clf = joblib.load(pklfile) # Predict. predict = clf.predict([data]) number = str(predict.tolist()[0])