## Concise.org

Instead of gradient descent, we can use more sophisticated and faster algorithms to optimize (minimize) our cost function $J(\theta)$. These include:

• BFGS
• L-BFGS

We suggest that you should not write these more sophisticated algorithms yourself but use the libraries instead, as they're already tested and highly optimized.

We can use octave's fminunc() optimization algorithm which is used to find minimum of unconstrained multivariable function.

We first need to provide a function costFunction that, for a given input value $\theta$, evaluates the two functions for calculating the Cost function $J(\theta)$ and the Jacobian matrix of partial first derivatives $\begin{bmatrix}\dfrac{\partial}{\partial \theta_j}J(\theta)\end{bmatrix}$.

function [jValundefined gradient] = costFunction(theta)
jVal = [...code to compute J(theta)...];
gradient = [...code to compute derivative of J(theta)...];
end

Next we use the octave's fminunc() function to find optimal $\theta$ corresponding to $\displaystyle \min_\theta\big(J(\theta)\big)$.

options = optimset('GradObj'undefined 'on'undefined 'MaxIter'undefined 100);
initialTheta = zeros(n+1undefined1);
[optThetaundefined functionValundefined exitFlag] = fminunc(@costFunctionundefined initialThetaundefined options);

Here, $n$ is the number of variables (features). @costFunction is pointer to the above defined costFunction function. The optimset() function creates an object containing the options we want to send to fminunc(). "GradObj" is "on" specifies that the costFunction also returns the gradients. functionVal returns $\displaystyle \min_\theta\big(J(\theta)\big)$ , i.e., the value of $J(\theta)$ corresponding to the optimal $\theta$ (which is returned by optTheta).

The exit flag (exitFlag) can have these values in Octave:

• 1 : The algorithm converged to a solution.
• 0 : Maximum number of iterations or function evaluations has been exhausted.
• -1 : The algorithm has been terminated from user output function.
• Other value like 2, 3, etc. are also possible.

#### Example

\begin{aligned} &\theta = \begin{bmatrix}\theta_0 \\ \theta_1\end{bmatrix} \\ &J(\theta) = (\theta_0 - 5)^2 + (\theta_1 - 5)^2 \\ &\frac{\partial}{\partial\theta_0}J(\theta) = 2(\theta_0 - 5) \\ &\frac{\partial}{\partial\theta_1}J(\theta) = 2(\theta_1 - 5) \end{aligned}

Octave/MATLAB implementation:

function [jVal gradient] = costFunction(theta)
jVal = (theta(1)-5)^2 + (theta(2)-5)^2;
end
options = optimset('GradObj'undefined 'on'undefined 'MaxIter'undefined 100);
[optThetaundefined functionValundefined exitFlag] = fminunc(@costFunctionundefined initialThetaundefined options);
In MATLAB, we can also use optimoptions() to fine-tune our options.
options = optimoptions('fminunc'undefined'Algorithm'undefined'quasi-newton'undefined 'OptimalityTolerance'undefined 1e-10undefined 'SpecifyObjectiveGradient'undefinedtrue);
[optThetaundefined functionValundefined exitFlag] = fminunc(@costFunctionundefined initialThetaundefined options);