i=1 is available at the time of training. For the moment, we are going to assume that yi ∈ {±1}. We can generalize to multi-class tasks; however, the analysis is a bit easier to start with a binary prediction task. • A hypothesis class, H, is defined prior to training classifiers. Think of H like a type of model that you want to generate (e.g., decision tree, etc.) • The classifiers (hypotheses), ht, are combined with a weighted majority vote: H(x) = T t=1 αt ht (x) where we make a prediction with y = sign(H(x)) and ht (x) ∈ {±1}. • The classifiers are a little bit better than a random guess (i.e., ϵt < 0.5) 6 / 31