## Logistic Regression: Hypothesis Representation

#### Logistic regression

Recall that in linear regression, our hypothesis function looked like: $h_\theta(x) = \theta^Tx$

Intuitively, it also doesn’t make sense for $h_\theta(x)$ to take values larger than 1 or smaller than 0 when we know that $y \in \{0, 1\}$. To fix this, let’s change the form for our hypothesis $h_\theta(x)$ to satisfy $0 \le h_\theta(x) \le 1$. This is accomplished by plugging $\theta^Tx$ into the Logistic Function.

The "Sigmoid Function" or "Logistic Function" is given as:
$g(z) = \dfrac{1}{1 + e^{-z}}$

The Sigmoid Function $g(z)$ maps any real number to the $(0, 1)$ interval, making it useful for transforming an arbitrary-valued function into a function better suited for classification. Note that $g(z) \ge 0.5$ when $z \ge 0$.

Plugging $\theta^Tx$ into the Logistic Function, we get our hypothesis function for Logistic regression:
$h_\theta(x) = g(\theta^Tx) = \dfrac{1}{1 + e^{-\theta^Tx}}$

Interpretation of hypothesis output:
$h_\theta(x)$ is the estimated probability that y = 1 on input x. Formally, $h_\theta(x) = P(y=1|x;\theta)$

Also, $P(y=0|x;\theta) + P(y=1|x;\theta) = 1$ therefore, $P(y=0|x;\theta) = 1 - h_\theta(x)$

Logistic regression is actually a classification algorithm and the word 'regression' appearing in its name is only for historical reasons.

Vectorized Implementation: $h_\theta(X) = g(X\theta) = \dfrac{1}{1 + e^{-X\theta}}$

where

$X = \begin{bmatrix} ... \space (x^{(1)})^T \space ... \\ ... \space (x^{(2)})^T \space ... \\ \vdots \\ ... \space (x^{(m)})^T \space ... \end{bmatrix}$ $($m x (n+1) matrix$)$

$\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix} \in \mathbb{R}^{n+1}$

Note that in this vectorized implementation, we calculate hypotheses of all $m$ training examples at once.