All of the models presented so far are for two class problems. We need a way to extend these classification methods to multiple classes • Mathematically, this means that the class label, yi, for each datum is now yi ∈ {1, 2, ..., c}, where c is the number of possible classifications. Split the data into c two-class problems. • For example, if we have c classes then we will have c different classifiers: class 1 against classes {2, 3, ..., c}, class 2 against classes {1, 3, ..., c}, etc. Multiclass Prediction with Logistic Regression P(Y = k|x; w1, w2, · · · , wc) = exp(wT k x) c j=1 exp(wj Tx) 41 / 44