Regularization: Cost Function
Suppose there is an issue of overfitting due to the presence of a large number of features. Consider the hypothesis function and we want to remove the influence of and terms which might be causing overfitting. Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function.
In order for and terms of the cost function to get close to zero, we will have to reduce the values of and to near zero. We can rewrite the cost function as:
We could also regularize all of our theta parameters in a single summation as:
The , or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated/deflated. The extra summation term added on the right is called the regularization term. Note that the regularization term does not include since we do not want to penalize .
If the regularization parameter is chosen to be too large, it will deflate all the theta values (except ) to nearly zero and thus smooth out the hypothesis function too much. In this case, the hypothesis function essentially becomes causing underfitting (high bias).