“ I N T R O D U C T I O N ” T O S C I K I T- L E A R N T H I S P R O B L E M W O U L D B E S T B E A D D R E S S E D W I T H M A C H I N E L E A R N I N G … @earino #dsla
number of random variables under consideration Feature selection / Feature extraction PCA - Principal Component Analysis from sklearn import decomposition ! Transform a bunch of variables into a smaller set of not correlated variables. iris example
data, with no labels… You ask yourself, is there any structure in this data? Congratulations, you may be asking about clustering. Clustering is a great way to “explore” data… from sklearn.cluster import KMeans handwritten digits example
people think of when referring to Machine Learning… You have “labeled” data… examples: • fraudulent/valid • woman/man ! Doesn’t have to be binary labels… is_car/is_truck/is_bicycle You want to predict the category of new data… classifiers comparison example
import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.naive_bayes import GaussianNB from sklearn.lda import LDA
everyone wishes they could do. If you can predict a quantity you can: •predict the stock market •predict currency fluctuations •be rich beyond your wildest dreams Cut it out. You can’t (really) do that. You can… •predict click through probability •predict housing sales prices