data Learned Model Self-driving cars Terrain data (slope, roughness, etc.) Function mapping terrain to speed Price prediction engine Customer & market attributes and past prices Function mapping customer and market attributes to prices Gene sequence identification Lots and lots of genome data Clusters of re- occuring gene sequence patterns
User tastes User 1 likes The Clash User 23 likes Die Ärzte User 42 likes Helene Fischer User 77 likes The Sex Pistols User 99 likes Heino Rain Wind Umbrella? heavy light yes none light no light strong no light light yes none strong no Supervised Unsupervised
RECOMMENDATION) ▸ Tutorial for the “Kaggle Titanic Competition” (using R): http://trevorstephens.com/post/ 72916401642/titanic-getting-started-with-r ▸ Online courses (MOOCs): ▸ Udacity: Intro to Machine Learning: https://www.udacity.com/course/intro-to-machine- learning--ud120 (Excellent intro to applied ML using sci-kit learn and Python) ▸ Coursera: Machine Learning: https://www.coursera.org/learn/machine-learning (Friendly intro to the theory behind common ML algorithm) ▸ Book: Abu-Mostafa, Magdon-Ismail, Lin: Learning From Data - A Short Course (AMLbook.com ) (Good intro to more academic perspectives, notation and vocabulary on ML) ▸ Toolkits: ▸ Scikit-Learn (Python, great online documentation): http://scikit-learn.org/stable/ ▸ stats package (many simple ML algorithms), pre-installed (R) Examples: http:// www.statmethods.net/stats/regression.html
1. Understand the problem and context 2. Understand, clean and prepare the data 3. For supervised learning: Split into training and test data 4. Evaluate different algorithms with default parameters 5. Optimize the parameters and compute the results 6. Interpret and present the results