The Cost Function


We can measure the accuracy of our hypothesis function by using a cost function.

The cost function J(θ0,θ1)J(\theta_0, \theta_1) is defined as:

J(θ0,θ1)=12mi=1m(y^(i)y(i))2=12mi=1m(hθ(x(i))y(i))2J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left ( \hat{y}^{(i)}- y^{(i)} \right)^2 = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2

The cost function is actually 12xˉ\frac{1}{2}\bar{x} where xˉ\bar{x} is "Mean squared error (MSE)" or "Squared error function". The mean is halved as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12\frac{1}{2} term.

If we try to think of it in visual terms, our training data set is scattered on the x-y plane. We are trying to make straight line (defined by hθ(x)h_\theta(x) which passes through this scattered set of data. Our objective is to get the best possible line. The best possible line will be such so that the cost function J(θ0,θ1)J(\theta_0, \theta_1) is minimized with respect to θ0\theta_0 and θ1\theta_1.

The cost function is sometimes also called optimization function or objective function.


A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. Taking any color and going along the 'circle', one would expect to get the same value of the cost function.



  • mm : Number of training examples
  • xx : Input variable
  • yy : Output variable / Target variable
  • (x,y)(x, y): Training examples
  • (x(i),y(i))(x^{(i)}, y^{(i)}): iith training example
  • hh: Hypothesis function
  • θi\theta_i's: parameters of the model