Answer:
Step-by-step explanation:
Consider X to be the matrix whose columns are the values for our 50 examples. The normal equation gives us the values of [tex]\theta[/tex] in the following way
[tex] \theta = (X^{T}X)^{-1}X^{T}y[/tex]
The matrix [tex]X^{T}X[/tex] however, might not be invertible when [tex]m\leq n[/tex]. So we must use the pseudo inverse to solve the problem. For a big number of features, calculating the pseudoinverse might be computational expensive. So, gradient descent should be prefered.