Logistic Regression: Hypothesis Representation


Logistic regression

Recall that in linear regression, our hypothesis function looked like: hθ(x)=θTxh_\theta(x) = \theta^Tx

Intuitively, it also doesn’t make sense for hθ(x)h_\theta(x) to take values larger than 1 or smaller than 0 when we know that y{0,1}y \in \{0, 1\}. To fix this, let’s change the form for our hypothesis hθ(x)h_\theta(x) to satisfy 0hθ(x)10 \le h_\theta(x) \le 1. This is accomplished by plugging θTx\theta^Tx into the Logistic Function.

The "Sigmoid Function" or "Logistic Function" is given as:
g(z)=11+ezg(z) = \dfrac{1}{1 + e^{-z}}

The Sigmoid Function g(z)g(z) maps any real number to the (0,1)(0, 1) interval, making it useful for transforming an arbitrary-valued function into a function better suited for classification.


Note that g(z)0.5g(z) \ge 0.5 when z0z \ge 0.

Plugging θTx\theta^Tx into the Logistic Function, we get our hypothesis function for Logistic regression:
hθ(x)=g(θTx)=11+eθTxh_\theta(x) = g(\theta^Tx) = \dfrac{1}{1 + e^{-\theta^Tx}}

Interpretation of hypothesis output:
hθ(x)h_\theta(x) is the estimated probability that y = 1 on input x. Formally, hθ(x)=P(y=1x;θ)h_\theta(x) = P(y=1|x;\theta)

Also, P(y=0x;θ)+P(y=1x;θ)=1P(y=0|x;\theta) + P(y=1|x;\theta) = 1 therefore, P(y=0x;θ)=1hθ(x)P(y=0|x;\theta) = 1 - h_\theta(x)

Logistic regression is actually a classification algorithm and the word 'regression' appearing in its name is only for historical reasons.

Vectorized Implementation: hθ(X)=g(Xθ)=11+eXθh_\theta(X) = g(X\theta) = \dfrac{1}{1 + e^{-X\theta}}


X=[... (x(1))T ...... (x(2))T ...... (x(m))T ...]X = \begin{bmatrix} ... \space (x^{(1)})^T \space ... \\ ... \space (x^{(2)})^T \space ... \\ \vdots \\ ... \space (x^{(m)})^T \space ... \end{bmatrix} ((m x (n+1) matrix))

θ=[θ0θ1θn]Rn+1\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix} \in \mathbb{R}^{n+1}

Note that in this vectorized implementation, we calculate hypotheses of all mm training examples at once.