Suppose you have a dataset with m = 50m=50 examples and n = 200000n=200000 features for each example. You want to use multivariate linear regression to fit the parameters \thetaθ to our data. Should you prefer gradient descent or the normal equation?

Respuesta :

Answer:

Step-by-step explanation:

Consider X to be the matrix whose columns are the values for our 50 examples. The normal equation gives us the values of [tex]\theta[/tex] in the following way

[tex] \theta = (X^{T}X)^{-1}X^{T}y[/tex]

The matrix [tex]X^{T}X[/tex] however, might not be invertible when [tex]m\leq n[/tex]. So we must use the pseudo inverse to solve the problem. For a big number of features, calculating the pseudoinverse might be computational expensive. So, gradient descent should be prefered.