## The Cost Function

We can measure the accuracy of our hypothesis function by using a *cost function*.

The cost function $J(\theta_0, \theta_1)$ is defined as:

$J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}^{(i)}- y^{(i)} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2$The cost function is actually $\frac{1}{2}\bar{x}$ where $\bar{x}$ is "Mean squared error (MSE)" or "Squared error function". The mean is halved as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the $\frac{1}{2}$ term.

If we try to think of it in visual terms, our training data set is scattered on the x-y plane. We are trying to make straight line (defined by $h_\theta(x)$ which passes through this scattered set of data. Our objective is to get the best possible line. The best possible line will be such so that the cost function $J(\theta_0, \theta_1)$ is minimized with respect to $\theta_0$ and $\theta_1$.

The *cost function* is sometimes also called *optimization function* or *objective function*.

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. Taking any color and going along the 'circle', one would expect to get the same value of the cost function.

#### Notations

- $m$ : Number of training examples
- $x$ : Input variable
- $y$ : Output variable / Target variable
- $(x, y)$: Training examples
- $(x^{(i)}, y^{(i)})$: $i$th training example
- $h$: Hypothesis function
- $\theta_i$'s: parameters of the model