Gradient Descent for Multiple Features

 

The gradient descent equation for nn features (variables) can be written as:

repeat until convergence: {

θj:=θjαθjJ(θ)\,\,\,\,\,\, \theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta) \,\,\,\,\,\, (simultaneously update θj\theta_j for every j0,...,nj \in 0,...,n)

}

On calculating the value of θjJ(θ)\frac{\partial}{\partial \theta_j} J(\theta), the above algorithm converts to:

repeat until convergence: {

θj:=θjα1mi=1m((hθ(x(i))y(i))xj(i))\,\,\,\,\,\, \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_{j}\right) \,\,\, \\ \quad \quad \quad \quad \quad (simultaneously update θj\theta_j for every j0,...,nj \in 0,...,n)
}


Vectorized implementation:

θ:=θα δ\theta := \theta - \alpha \space \delta

where θ,δRn+1\theta,\delta \in \mathbb{R}^{n+1} are vectors as shown below

δ=1mi=1m((hθ(x(i))y(i))x(i))=1mXT(Xθy)\begin{aligned} \delta = \frac{1}{m} \sum\limits_{i=1}^{m} \left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}\right) = \frac{1}{m}X^T(X\theta - y) \end{aligned}

θ=[θ0θ1θn]Rn+1\theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix} \in \mathbb{R}^{n+1}

x(i)=[x0(i)x1(i)xn(i)] Rn+1x^{(i)} = \begin{bmatrix} x^{(i)}_0 \\ x^{(i)}_1 \\ \vdots \\ x^{(i)}_n \end{bmatrix} \in \space \mathbb{R}^{n+1}

X=[... (x(1))T ...... (x(2))T ...... (x(m))T ...]X = \begin{bmatrix} ... \space (x^{(1)})^T \space ... \\ ... \space (x^{(2)})^T \space ... \\ \vdots \\ ... \space (x^{(m)})^T \space ... \end{bmatrix} ((m x (n+1) matrix))

y=[y(1)y(2)y(m)]Rmy = \begin{bmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{bmatrix} \in \mathbb{R}^{m}


Derivation: Vectorized implementation

Let  c(i)=(hθ(x(i))y(i))\space c^{(i)} = (h_\theta(x^{(i)}) - y^{(i)})

m δ=m[δ0δ1δn]=[i=1mc(i)x0(i)i=1mc(i)x1(i)i=1mc(i)xn(i)]=c(1)x(1)+c(2)x(2)+...+c(m)x(m)=i=1mc(i)x(i)m \space \delta = m \begin{bmatrix} \delta_0 \\ \delta_1 \\ \vdots \\ \delta_n \end{bmatrix} = \begin{bmatrix} \sum\limits_{i=1}^m c^{(i)}x^{(i)}_0 \\ \sum\limits_{i=1}^m c^{(i)}x^{(i)}_1 \\ \vdots \\ \sum\limits_{i=1}^m c^{(i)}x^{(i)}_n \end{bmatrix} = c^{(1)}x^{(1)} + c^{(2)}x^{(2)} + ... + c^{(m)}x^{(m)} = \sum\limits_{i=1}^m c^{(i)}x^{(i)}