Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Slides from "Machine Learning with scikit-learn...

Slides from "Machine Learning with scikit-learn" @ SciPy 2016

Machine Learning with scikit-learn -- Slides from SciPy 2016 in Austin, Texas.

This tutorial aims to provide an introduction to machine learning and scikit-learn "from the ground up". We will start with core concepts of machine learning, some example uses of machine learning, and how to implement them using scikit-learn. Going in detail through the characteristics of several methods, we will discuss how to pick an algorithm for your application, how to set its parameters, and how to evaluate performance.

Sebastian Raschka

July 12, 2016
Tweet

More Decks by Sebastian Raschka

Other Decks in Technology

Transcript

  1. scikit Machine Learning Scien-fic Compu-ng with Python Aus-n, Texas •

    July 11-17, 2016 Sebastian Raschka & Andreas Mueller! with
  2. Sebastian Raschka & Andreas Mueller! Links! Contact Info:! Sebastian Raschka!

    Ø  [email protected]! Ø  http://sebastianraschka.com! Ø  @rasbt! Andreas Mueller! Ø  [email protected]! Ø  http://amueller.github.io! Ø  @amuellerml! Tutorial Material on GitHub:! https://github.com/amueller/scipy-2016-sklearn! 2!
  3. Sebastian Raschka & Andreas Mueller! Tutorial Setup I! $ git

    remote add upstream https://github.com/amueller/scipy-2016-sklearn.git $ git fetch upstream $ git checkout master merge upstream/master a) Fork the Repository (if you haven’t done so, yet):! b) Sync an older fork:! 3!
  4. Sebastian Raschka & Andreas Mueller! Our Agenda! Ø Morning Session: 8:00

    AM - 12:00 PM (Room 105)! Ø Afternoon Session: 1:30 PM - 5:30 PM (Room 105)! 5!
  5. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 6!
  6. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 7! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning!
  7. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 8! s!
  8. Sebastian Raschka & Andreas Mueller! What is Machine Learning?! 9!

    Programmer! Program! Computer! Outputs! Inputs (observations)! “Traditional” programming! Machine Learning! Outputs! Inputs (observations)! Computer! Program! Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.! -- Arthur Samuel (1959)!
  9. Sebastian Raschka & Andreas Mueller! Examples of Machine Learning! 10!

    hJps://flic.kr/p/5BLW6G [CC BY 2.0] hJp://commons.wikimedia.org/wiki/ File:NeXlix_logo.svg [public domain] By Steve Jurvetson [CC BY 2.0] hJp://commons.wikimedia.org/wiki/ File:American_book_company_1916._leJer_envelope-2.JPG# filelinks [public domain] And many, many more …!
  10. Sebastian Raschka & Andreas Mueller! 3 Types of Learning! 11!

    Supervised! Unsupervised! Reinforcement! Ø  Learning from labeled data! Ø  E.g., Spam classification! Ø  Discover structure in unlabeled data! Ø  E.g., Document clustering! Ø  Learning by “doing” with delayed reward! Ø  E.g., Chess computer!
  11. Sebastian Raschka & Andreas Mueller! Instances (samples, observations) Features (attributes,

    dimensions) Classes (targets) sepal_length sepal_width petal_length petal_width class 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa … … … … … … 50 6.4 3.2 4.5 1.5 veriscolor … … … … … … 150 5.9 3.0 5.1 1.8 virginica https://archive.ics.uci.edu/ml/datasets/Iris IRIS Data Representation! 15!
  12. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! ! 8:00 AM - 12:00 PM 16! s!
  13. Sebastian Raschka & Andreas Mueller! NumPy Arrays! 18! Image source:

    “Why Python is Slow: Looking Under the Hood” by Jake VanderPlas hJp://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/ Ø  build around a C array with pointers to a contiguous data buffer of values! Ø  Linear algebra functions! Ø  Fancy indexing! Ø  …! >>> import numpy >>> ary = numpy.array([7, 8, 9, 10, 11]) >>> ary[[2, 4]] array([ 9, 11]) >>> lst = list([7, 8, 9, 10, 11]) >>> lst[2, 4]] >>> lst[[2, 4]] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: list indices must be integers or slices, not list
  14. Sebastian Raschka & Andreas Mueller! Scipy Sparse Matrices! 19! >>>

    from scipy import sparse >>> mtx = sparse.lil_matrix([[0, 1, 2, 0], ... [3, 0, 1, 0], ... [1, 0, 0, 1]]) >>> print(mtx) (0, 1) 1 (0, 2) 2 (1, 0) 3 (1, 2) 1 (2, 0) 1 (2, 3) 1 >>> print(mtx.toarray()) [[0 1 2 0] [3 0 1 0] [1 0 0 1]] List of Lists (LIL) example!
  15. Sebastian Raschka & Andreas Mueller! Matplotlib! 20! >>> import matplotlib.pyplot

    as plt >>> import numpy as np >>> >>> mu, sigma = 200, 25 >>> x = mu + sigma*np.random.randn(10000) >>> plt.hist(x, 20, normed=1, ... histtype='stepfilled', ... facecolor='b', ... alpha=0.75) >>> plt.show()
  16. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 21! s!
  17. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 25! A!
  18. Sebastian Raschka & Andreas Mueller! Training & Test Data! 26!

    All Data! Training Data! Test Data! Typically:! Ø  75% : 25%! Ø  2/3 : 1/3!
  19. Sebastian Raschka & Andreas Mueller! Stratification! 27! Non-stratified split:! Ø 

    training set → 38 x Setosa, 28 x Versicolor, 34 x Virginica! Ø  test set → 12 x Setosa, 22 x Versicolor, 16 x Virginica!
  20. Sebastian Raschka & Andreas Mueller! K-Nearest Neighbors! 28! k=5! Image

    source: https://github.com/rasbt/python-machine-learning- book/blob/master/code/ch03/images/03_20.png!
  21. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 29! A!
  22. Sebastian Raschka & Andreas Mueller! Supervised Workflow! 30! Training Data

    Test Data Training Labels Model Prediction Test Labels Evaluation Training Generalization Ø  Fit model on all data after evaluation! TRAINING! GENERALIZATION!
  23. Sebastian Raschka & Andreas Mueller! Supervised Workflow! 31! Training Data

    Test Data Training Labels Model Prediction Test Labels Evaluation Training Generalization TRAINING! GENERALIZATION! estimator.fit(X_train, y_train) estimator.predict(X_test) estimator.score(X_test, y_test)
  24. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 32! A!
  25. Sebastian Raschka & Andreas Mueller! Linear Regression! 33! y =coef_[0]*X[0]

    + intercept_ X[0] (feature variable) y (target variable)
  26. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 34! S!
  27. Sebastian Raschka & Andreas Mueller! Unsupervised Transformers! 35! Training Data

    Test Data Model New View ①  transformer.fit(X_train) ②  X_train_transf = transformer.transform(X_train) ③  X_test_transf = transformer.transform(X_test)
  28. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 39! S!
  29. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 42! A!
  30. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic) 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 44! A!
  31. Sebastian Raschka & Andreas Mueller! Continuous & Categorical Features! 45!

    Con-nuous Categorical Nominal Ordinal e.g., sepal width in cm! [3.4, 4.7 …]! e.g., ratings! [satisfied, neutral, unsatisfied]! e.g., colors! [red, green, blue, …]!
  32. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 47! A!
  33. Sebastian Raschka & Andreas Mueller! • D1: ”Each state has

    its own laws.”! • D2: ”Every country has its own culture.”! ! V ={each:1, state:1, has:2, its:2, own:2, ! laws: 1, every: 1, country: 1, culture: 1} ! ! Bag of Words! 48!
  34. Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to

    machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification 8:00 AM - 12:00 PM 49! A!
  35. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 51! 13 Cross-Validation 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!
  36. Sebastian Raschka & Andreas Mueller! Holdout Evaluation I! 52! Learning

    Algorithm Hyperparameter Values Model Prediction 2 1 Test Labels Test Data Training Data Training Labels Data Labels Training Data Training Labels Test Data
  37. Sebastian Raschka & Andreas Mueller! Holdout Evaluation II! 53! Learning

    Algorithm Prediction Test Labels Performance Model Learning Algorithm Hyperparameter Values Final Model This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License. 3 4 Data Labels Training Labels Test Data
  38. Sebastian Raschka & Andreas Mueller! Holdout Validation I! 54! 2

    1 Data Labels Training Data Validation Data Validation Labels Test Data Test Labels Training Labels Performance Validation Data Prediction Learning Algorithm Hyperparameter values Model Hyperparameter values Hyperparameter values Model Model Training Data Training Labels
  39. Sebastian Raschka & Andreas Mueller! Holdout Validation II! 55! Performance

    Model Validation Data Validation Labels Prediction Performance Model Validation Data Validation Labels Prediction Performance Model Validation Data Validation Labels Prediction Best Model Hyperparameter values Model 3 Best Hyperparameter values 4 Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels Validation Data Validation Labels
  40. Sebastian Raschka & Andreas Mueller! Holdout Validation III! 56! Learning

    Algorithm Best Hyperparameter Values Final Model This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License. 6 Data Labels Prediction Test Labels Performance Model Test Data 5
  41. Sebastian Raschka & Andreas Mueller! K-fold Cross-Validation! 57! 1st 2nd

    3rd 4th 5th K Iterations (K-Folds) Validation Fold Training Fold Learning Algorithm Hyperparameter Values Model Training Fold Data Training Fold Labels Prediction Performance Model Validation Fold Data Validation Fold Labels Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License.
  42. Sebastian Raschka & Andreas Mueller! 58! K-fold Cross-Validation Pipeline I!

    Test Labels Test Data Training Data Training Labels Data Labels Model Model Model Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data Training Labels Best Hyperparameter Values Model Training Data 2 1 3
  43. Sebastian Raschka & Andreas Mueller! 59! K-fold Cross-Validation Pipeline II!

    This work by Sebastian Raschka is licensed under a Model Hyperparameter values Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels Prediction Test Labels Performance Model Test Data Learning Algorithm Best Hyperparameter Values Final Model Data Labels 3 4 5
  44. Sebastian Raschka & Andreas Mueller! Nested CV! 60! 1st 2nd

    3rd 4th 5th Outer Loop Outer Validation Fold Outer Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License. Inner Loop Inner Training Fold Inner Validation Fold Performance Performance Best Hyperparameter Values Best Learning Algorithm 5,1 5, 2 Performance 1 2 ∑ 2 j=1 Performance5,j
  45. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 61! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!
  46. Sebastian Raschka & Andreas Mueller! Learning Curves! 62! Image source:

    hJps://github.com/rasbt/python-machine- learning-book/blob/master/code/ch06/images/06_04.png
  47. Sebastian Raschka & Andreas Mueller! Grid Search! 64! gamma parameter!

    C parameter! Source: hJp://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html
  48. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 65! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!
  49. Sebastian Raschka & Andreas Mueller! Pipelines! 66! pipe.fit(X, y) T1

    X y T1.fit(X, y) T2.fit(X1, y) Classifier.fit(X2, y) T1.transform(X) pipe.predict(X') X' y' Classifier.predict(X'2) T2 Classifier T2 T1 X1 y T2.transform(X1) X2 y Classifier T1.transform(X') X'1 T2.transform(X'1) X'2 pipe = make_pipeline(T1(), T2(), Classifier())
  50. Sebastian Raschka & Andreas Mueller! Pipelines & Cross Validation! 67!

    Training Data Training Labels Model Feature Extraction Scaling Feature Selection Cross Validation
  51. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 68! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!
  52. Sebastian Raschka & Andreas Mueller! Receiver Operator Characteristic! 73! Image

    source: hJp://scikit-learn.org/stable/_images/plot_roc_001.png
  53. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 75! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!
  54. Sebastian Raschka & Andreas Mueller! y_pred = x_test[0] * coef_[0]

    + ... + x_test[n_features-1] * coef_[n_features-1] + intercept_ Linear models for regression! 76! x[0] y slope (=Δy/Δx)!
  55. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 77! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!
  56. Sebastian Raschka & Andreas Mueller! Support Vector Machines! 78! Image

    source: hJps://github.com/rasbt/python-machine-learning- book/blob/master/code/ch03/images/03_07.png
  57. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 80! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!
  58. Sebastian Raschka & Andreas Mueller! Bagging Example! 86! Image source:

    hJps://github.com/rasbt/python-machine-learning-book/blob/master/code/ch07/images/07_08.png
  59. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 88! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!
  60. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 90! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!
  61. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 93! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction 23 Supervised learning: Out-of-core learning! S!
  62. Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM -

    5:30 PM 97! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning S!