Logistic Regression: Cost Function

 

We cannot use the same cost function that we use for linear regression because the Logistic (Sigmoid) Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.

23-convex-vs-non-convex

Instead, our cost function for logistic regression looks like:

J(θ)=1mi=1mCost(hθ(x(i)),y(i))whereCost(hθ(x),y)={log(hθ(x))if y=1log(1hθ(x))if y=0\begin{aligned} &J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \\ &\mathrm{where} \\ &\mathrm{Cost}(h_\theta(x),y) = \begin{cases} -\log(h_\theta(x)) & \text{if } y = 1 \\ -\log(1-h_\theta(x)) & \text{if } y = 0 \end{cases} \end{aligned}

Note that writing the cost function in this way guarantees that J(θ)J(\theta) is convex for logistic regression.

Cost(hθ(x),y)\mathrm{Cost}(h_\theta(x),y) actually gives us the cost for a particular training example (x,y)(x,y) at a particular θ\theta. Plotting Cost(hθ(x),y)\mathrm{Cost}(h_\theta(x),y) w.r.t. hθ(x)h_\theta(x) gives:

24-example-cost-function

Cost(hθ(x),y)=0    if hθ(x)=yCost(hθ(x),y){ if y=0  and  hθ(x)1 if y=1  and  hθ(x)0\begin{aligned} & \mathrm{Cost}(h_\theta(x),y) = 0 \space \space \space \text{ if } h_\theta(x) = y \\ & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \begin{cases} \text{ if } y = 0 \; \mathrm{and} \; h_\theta(x) \rightarrow 1 \\ \text{ if } y = 1 \; \mathrm{and} \; h_\theta(x) \rightarrow 0 \end{cases} \end{aligned}