Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[StartupCourse/18] Discover Machine Learning

[StartupCourse/18] Discover Machine Learning

You want to predict the future ?

Machine Learning is not so complex !

Let's discover it!

Avatar for Fabien Vauchelles

Fabien Vauchelles

April 12, 2016
Tweet

More Decks by Fabien Vauchelles

Other Decks in Technology

Transcript

  1. BACK TO MACHINE LEARNING If you know the passengers list:

    • Gender • Age • Ticket class • Does he survived ? You can create a Decision Tree ... ... for this Supervised Problem !
  2. A feature is a column in a dataset. Age of

    passengers is a feature !
  3. DATA ANALYSIS / DISTRIBUTION 5% 25% 50% 75% 95% Quantile

    1 Quantile 2 Quantile 3 Quantile 4 Outlier BOX PLOT
  4. DATA ANALYSIS / MISSING VALUES Choose replace value carefully !

    10 Nan 20 Nan 30 10 0 20 0 30 Mean = 12 BAD !!!
  5. DATA ANALYSIS / MISSING VALUES Fill empty value with median:

    20 10 Nan 20 Nan 30 10 20 20 20 30 Mean = 20 GOOD !!!
  6. DATA ANALYSIS / REMOVE OUTLIERS Use median: 30 Mean =

    28 GOOD !!! 10 20 20 30 30 10 20 30 1000 50
  7. Surface (m2) Rooms Bedrooms Garden (m2) Price (€) 200 5

    2 200 500 000 100 3 1 0 200 000 300 5 2 300 800 000 150 4 2 100 300 000 200 4 1 200 ? REGRESSION Find house price:
  8. ALGORITHMS DECISION TREE DEEP LEARNING REGRESSION CLUSTERING BAYESIAN NLP Linear

    Regression Logistic Regression Convolutional Neural Network Deep Boltzmann Machine Recurrent Neural Network Gaussian Naive Bayes Multinomial Naive Bayes Bayesian Network k-Means k-Medians Hierarchical Clustering Perceptron Random Forest Gradient Boosting XGBoost TF-IDF Word2Vec
  9. DECISION TREES / SPLIT Passenger with class 3 ? Adult

    passenger ? Male passenger ? Alive Dead
  10. DECISION TREES / SPLIT Passenger Class 3 ? YES NO

    DEAD Passenger with class 3 ? Adult passenger ? Male passenger ? Alive Dead
  11. DECISION TREES / SPLIT Adult ? YES NO DEAD Passenger

    Class 3 ? YES NO DEAD Passenger with class 3 ? Adult passenger ? Male passenger ? Alive Dead
  12. DECISION TREES / SPLIT ALIVE DEAD Male ? YES NO

    Adult ? YES NO DEAD Passenger Class 3 ? YES NO DEAD Passenger with class 3 ? Adult passenger ? Male passenger ? Alive Dead
  13. CLUSTERING / DISTANCE Euclidienne Manhattan d(A,B) = |X B -X

    A |+|Y B -Y A | d(A,B) = sqrt[(X B -X A )2+(Y B -Y A )2]
  14. PRECISION Prediction Positive Negative Reality True True positive False negative

    False False Positive True negative Precision = True positive True positive + False positive
  15. RECALL Prediction Positive Negative Reality True True positive False negative

    False False Positive True negative Recall = True positive True positive + False negative
  16. TRAIN & TEST X y X y y_predict Δ TEST

    (30%) TRAIN (70%) random
  17. RESOURCES • Coursera Machine Learning https://www.coursera.org/learn/machine-learning • Fondamentaux et Etudes

    de cas, Eric Biernat http://www.amazon.fr/dp/2212142439 • Meetup Machine Learning Paris http://www.meetup.com/fr-FR/Paris-Machine-learning-applications-group/ • Le Meilleur Data Scientist de France http://www.meetup.com/fr-FR/FrenchData/events/228508819/