tools, re-running ($$) - more algos, more tools (OS/commercial?) - (even) more tuning of parameters - BaaS? crowdsourcing (data, tools/tuning)? - other ML problems (recsys, NLP…)
X (n*p), y (n) score: f(x’) algos: k-NN, LR, NB, RF, GBM, SVM, NN, DL… goal: max accuracy measure (on new data) f ∈ F(θ) min θ ( L(y, f(x,θ)) + R(θ) ) on train set evaluate on separate test set /cross validation
- R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani - Java, but C-style memalloc, by Java gurus - distributed, “big data”
- R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani - Java, but C-style memalloc, by Java gurus - distributed, “big data” - many knobs/tuning, model evaluation, cross validation, model selection (hyperparameter search)
- R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani - Java, but C-style memalloc, by Java gurus - distributed, “big data” - many knobs/tuning, model evaluation, cross validation, model selection (hyperparameter search)