## Linear Regression: Multiple Features

Linear regression with multiple variables (features) is also known as "multivariate linear regression".

#### Hypothesis function for multiple features

The multivariable form of the hypothesis function:

$\hat{y} = h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n$

Denoting $x^{(i)}_0 = 1$ for $(i \in 1,2,...,m)$, we can rewrite above as:

$h_\theta(x) = \sum_{i=0}^n \theta_i x_i = \begin{bmatrix} \theta_0 \space \theta_1 \space ... \space \theta_n \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix} = \theta^T x$

where $\theta, x \in \mathbb{R}^{n+1}$

NOTE:
The above transformation of $h_\theta(x)$ from $\sum_{i=0}^n \theta_i x_i$ to $\theta^T x$ is an example of 'vectorization' technique which is used to speed-up computations using available optimized numerical linear algebra libraries.

Notations:

• $m$ : Number of training examples
• $n$ : Number of features
• $x^{(i)}_j$: Value of feature $j$ in the $i$th training example
• $x^{(i)}$: Input (features) of the $i$th training example; this is a vector

E.g.,

$x^{(i)} = \begin{bmatrix} x^{(i)}_0 \\ x^{(i)}_1 \\ \vdots \\ x^{(i)}_n \end{bmatrix} \in \mathbb{R}^{n+1}$

Vectorized Implementation: $h_\theta(X) = X\theta$

where

$X = \begin{bmatrix} ... \space (x^{(1)})^T \space ... \\ ... \space (x^{(2)})^T \space ... \\ \vdots \\ ... \space (x^{(m)})^T \space ... \end{bmatrix}$ $($m x (n+1) matrix$)$

$\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix} \in \mathbb{R}^{n+1}$

Note that in this vectorized implementation, we calculate hypotheses of all $m$ training examples at once.

#### Cost function for multiple features

Recall that the cost function $J(\theta)$ is defined as:

$J(\theta) = \dfrac {1}{2m} \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2$

Vectorized implementation:

$J(\theta) = \dfrac {1}{2m} (X\theta - y)^T(X\theta - y)$