Regularized Logistic Regression

 

Cost function

The cost function for regularized logistic regression is:

J(θ)= 1mi=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]+λ2m j=1nθj2J(\theta) = - \space \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \ \sum_{j=1}^n \theta_j^2

Gradient descent

The gradient descent algorithm for regularized logistic regression in terms of hypothesis function hθ(x)h_\theta(x) looks exactly same as that for the regularized linear regression:

Repeat {    θ0:=θ0α 1mi=1m((hθ(x(i))y(i))x0(i))    θj:=θjα [1mi=1m((hθ(x(i))y(i))xj(i))+λmθj]       j{1,2...n}}\begin{aligned} & \text{Repeat}\ \lbrace \\ & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_{0}\right) \\ & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_{j}\right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\\ & \rbrace \end{aligned}

After some manipulation, this can be written as:

Repeat {    θ0:=θ0α 1mi=1m((hθ(x(i))y(i))x0(i))    θj:=θj (1α λm)α 1mi=1m((hθ(x(i))y(i))xj(i))      j{1,2...n}}\begin{aligned} & \text{Repeat}\ \lbrace \\ & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_{0}\right) \\ & \ \ \ \ \theta_j := \theta_j \ \Big( 1 - \alpha \ \frac{\lambda}{m} \Big) - \alpha\ \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_{j}\right) &\ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\\ & \rbrace \end{aligned}

Example: Varying λ\lambda for regularized logistic regression

30-regularization-logistic-reg-1

Figure 1: Training data with decision boundary (λ\lambda = 1)

31-regularization-logistic-reg-2

Figure 2: No regularization (Overfitting) (λ\lambda = 0)

32-regularization-logistic-reg-3

Figure 3: Too much regularization (Underfitting) (λ\lambda = 100)