## The Problem of Overfitting

While coming up with a hypothesis function, it might seem that the more features we add, the better. However, there is also a danger in adding too many features.

This terminology is applied to both linear and logistic regression.

Underfitting or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features.

Overfitting or high variance, at the other extreme, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

Ways to address issue of overfitting:

• Reduce the number of features
• Manually select which features to keep
• Use a model selection algorithm (later in the course)
• Regularization
• Keep all the features, but reduce the magnitude/values of parameters $\theta_j$
• Works well when we have a lot of slightly useful features