Upgrade to Pro — share decks privately, control downloads, hide ads and more …

scipy2013

 scipy2013

Talk from SciPy 2013 "A Gentle Introduction to Machine Learning"

Avatar for Kyle Kastner

Kyle Kastner

July 13, 2013
Tweet

More Decks by Kyle Kastner

Other Decks in Programming

Transcript

  1. A Gentle Introduction To Machine Learning Kyle Kastner Southwest Research

    Institute (SwRI) University of Texas - San Antonio (UTSA)
  2. Outline • Why Use Machine Learning? • Workflow • Resources

    • Final Comments http://isl.ce.sharif.edu/
  3. Why Use Machine Learning? • Drowning in data • Computers

    are cheap, humans are expensive • Psychic superpowers (sometimes) http://blog.thepertgroup.com
  4. Types of Problems • Regression (Supervised) ◦ Predict housing prices

    • Classification (Supervised) ◦ Handwritten digit recognition • Clustering (Unsupervised) ◦ Document tagging http://2.bp.blogspot.com
  5. Data • from sklearn import datasets • Iris, Digits are

    excellent for classification • Boston for regression • Any classification dataset (sans labels) for clustering • Very good for generating data http://en.memory-alpha.org
  6. Preprocessing • Separate training data • Normalize by subtracting mean

    and dividing by variance • Use Principle Component Analysis (PCA) to keep structure while reducing dimensions • PCA to plot N-dimensional data in 2D or 3D
  7. Linear Regression • Find the "best fit" line • Outliers

    will greatly affect results • Perform regression into different basis • Basis can be Fourier, polynomial, wavelet, etc.
  8. Logistic Regression • Simple method for classification • Uses regression

    to split classes • Can be very powerful, especially after PCA
  9. Support Vector Machine (SVM) • Margin parameter is a configurable

    "allowed error" to account for class overlap • Boundaries use a semi- arbitrary "kernel" function • Linear, polynomial, wavelet, sigmoid
  10. Resources • Scikit-learn documentation and examples ◦ The infamous cheat

    sheet • Coursera courses ◦ Andrew Ng's Machine Learning • Pattern Recognition and Machine Learning ◦ Christopher M. Bishop
  11. Final Comments • Machine learning is a spectrum • Data

    preprocessing is vital • Prefer simple models to complex ones • Use sklearn
  12. Bonus: Trends in Machine Learning • Deep networks • Generative

    models • Unsupervised data from Youtube • Text-to-speech • Image object recognition • Google+ untagged image search