Logistic Regression: Advanced Optimization

 

Instead of gradient descent, we can use more sophisticated and faster algorithms to optimize (minimize) our cost function J(θ)J(\theta). These include:

  • Conjugate gradient
  • BFGS
  • L-BFGS

We suggest that you should not write these more sophisticated algorithms yourself but use the libraries instead, as they're already tested and highly optimized.

We can use octave's fminunc() optimization algorithm which is used to find minimum of unconstrained multivariable function.

We first need to provide a function costFunction that, for a given input value θ\theta, evaluates the two functions for calculating the Cost function J(θ)J(\theta) and the Jacobian matrix of partial first derivatives [θjJ(θ)]\begin{bmatrix}\dfrac{\partial}{\partial \theta_j}J(\theta)\end{bmatrix}.

function [jValundefined gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Next we use the octave's fminunc() function to find optimal θ\theta corresponding to minθ(J(θ))\displaystyle \min_\theta\big(J(\theta)\big).

options = optimset('GradObj'undefined 'on'undefined 'MaxIter'undefined 100);
initialTheta = zeros(n+1undefined1);
[optThetaundefined functionValundefined exitFlag] = fminunc(@costFunctionundefined initialThetaundefined options);

Here, nn is the number of variables (features). @costFunction is pointer to the above defined costFunction function. The optimset() function creates an object containing the options we want to send to fminunc(). "GradObj" is "on" specifies that the costFunction also returns the gradients. functionVal returns minθ(J(θ))\displaystyle \min_\theta\big(J(\theta)\big) , i.e., the value of J(θ)J(\theta) corresponding to the optimal θ\theta (which is returned by optTheta).

The exit flag (exitFlag) can have these values in Octave:

  • 1 : The algorithm converged to a solution.
  • 0 : Maximum number of iterations or function evaluations has been exhausted.
  • -1 : The algorithm has been terminated from user output function.
  • Other value like 2, 3, etc. are also possible.

Example

θ=[θ0θ1]J(θ)=(θ05)2+(θ15)2θ0J(θ)=2(θ05)θ1J(θ)=2(θ15)\begin{aligned} &\theta = \begin{bmatrix}\theta_0 \\ \theta_1\end{bmatrix} \\ &J(\theta) = (\theta_0 - 5)^2 + (\theta_1 - 5)^2 \\ &\frac{\partial}{\partial\theta_0}J(\theta) = 2(\theta_0 - 5) \\ &\frac{\partial}{\partial\theta_1}J(\theta) = 2(\theta_1 - 5) \end{aligned}

Octave/MATLAB implementation:

function [jVal gradient] = costFunction(theta)
    jVal = (theta(1)-5)^2 + (theta(2)-5)^2;
    gradient = zeros(2undefined1);
    gradient(1) = 2*(theta(1)-5);
    gradient(2) = 2*(theta(2)-5);
end
options = optimset('GradObj'undefined 'on'undefined 'MaxIter'undefined 100);
initialTheta = zeros(2undefined1);
[optThetaundefined functionValundefined exitFlag] = fminunc(@costFunctionundefined initialThetaundefined options);

In MATLAB, we can also use optimoptions() to fine-tune our options.

options = optimoptions('fminunc'undefined'Algorithm'undefined'quasi-newton'undefined 'OptimalityTolerance'undefined 1e-10undefined 'SpecifyObjectiveGradient'undefinedtrue);
[optThetaundefined functionValundefined exitFlag] = fminunc(@costFunctionundefined initialThetaundefined options);