large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability the title of this talk was misguided but so is recently almost every use of the term AI