Multiclass Classification: One-vs-all Method


Multi-class classification

Now we will approach the classification of data when we have more than two categories. Instead of y={0,1}y = \{0,1\} with two classes, we will expand our definition so that y={0,1,...,k1}y = \{0,1,...,k-1\} with kk classes.


Examples of multi-class classification:

  • Email tagging: Work, Friends, Family, Hobby
  • Medical condition: Not ill, Cold, Flu
  • Weather: Sunny, Cloudy, Rain, Snow

One-vs-all (One-vs-rest) method


We divide our problem into kk binary classification problems. Then we train a Logistic regression classifier hθ(i)(x)h_\theta^{(i)}(x) for each class ii to predict the probability that y=iy=i.

y{0,1,...,k1}hθ(0)(x)=P(y=0x;θ)hθ(1)(x)=P(y=1x;θ)hθ(k1)(x)=P(y=k1x;θ)\begin{aligned}& y \in \lbrace0, 1, ..., k-1\rbrace \\& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \\& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \\& \cdots \\& h_\theta^{(k-1)}(x) = P(y = k-1 | x ; \theta) \end{aligned}

We are basically choosing one class and then lumping all the others into a single second class. We do this repeatedly, applying binary logistic regression to each case.

On a new input xx, to make a prediction, pick the class ii that maximizes hθ(i)(x)h_\theta^{(i)}(x), i.e., use the hypothesis that returned the highest value as our prediction.

prediction=maxi(hθ(i)(x))\mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )