It uses a descent with the following step:

Computationally heavy as you need to calculate the hessian all the time.

Why that step:


This is some quadratic - we find the minimum of it.
Take gradient and set to zero Minimum

Setting this to zero gives:
In 1-dimension, Newton’s method is a way for solving .
Let .

Theorem:

Let be an Alpha-strongly convex function.
Let be -Lipschitz.
Then Newton’s method satisfies the bound:

Remark:

This is exponentially faster than gradient descent.
But needs more memory so it’s not used for high dimensions.

Proof:

NON EXAMINABLE!!!