It uses a descent with the following step:
Computationally heavy as you need to calculate the hessian all the time.
Why that step:
This is some quadratic - we find the minimum of it.
Take gradient and set to zero Minimum
Setting this to zero gives:
In 1-dimension, Newton’s method is a way for solving
Let
Theorem:
Let
Let
Then Newton’s method satisfies the bound:
Remark:
This is exponentially faster than gradient descent.
But needs more memory so it’s not used for high dimensions.
Proof:
NON EXAMINABLE!!!