Update rule for gradient descent with momentum

by Anant Agarwal   Last Updated August 14, 2019 20:19 PM

I am really confused about applying gradient descent with momentum. The trusted resources which I use for learning about AI have different information. CS231n says to use momentum like this,
enter image description here

Same implementation is suggested by Michael Nielsen in his deep learning book. But Andrew Ng's deep learning course says this, enter image description here

What's happening? Are these two same, I tried to make sure that doesn't happen and I am pretty sure that they are not same. But enlighten me.

Related Questions

The sum of squared error

Updated August 16, 2017 02:19 AM

Is NAG always better than 'Classical' Momentum?

Updated August 02, 2016 08:08 AM

Convergence proof of ADAM optimizer

Updated June 13, 2017 06:19 AM