Gradient Descent: Learning Rate
Testing convergence of gradient descent
Make a plot with number of iterations on the x-axis. Now plot the cost function, over the number of iterations of gradient descent.
Ideally, should decrease after every iteration.
Number of iterations taken by gradient descent to converge can vary a lot depending for different cases.
Example automatic convergence test:
Declare convergence if decereases by less than in one iteration.
However, in practice choosing a particular threshold value can be pretty difficult. So, in order to test convergence of gradient descent, graphs like above can be more effective.
Debugging: How to make sure gradient descent is working correctly
If J(θ) ever increases, then you probably need to decrease .
It has been proven that if learning rate is sufficiently small, should decrease on every iteration.
How to choose learning rate
- If is too small:
- slow convergence
- If is too large:
- may not decrease on every iteration, and thus may not converge
- may even diverge
- sometimes, slow convergence also possible
In order to choose suitable , try a range of values and corresponding to each value of , plot the values of as a function of number of iterations. Then, choose an which causes to decrease rapidly.
For trying a range of values of , you can choose factors of 10 or 3, for example.
E.g., ..., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ...