I am really confused about applying gradient descent with momentum. The trusted resources which I use for learning about AI have different information.
CS231n says to use momentum like this,
Same implementation is suggested by Michael Nielsen in his deep learning book. But Andrew Ng's deep learning course says this,
What's happening? Are these two same, I tried to make sure that doesn't happen and I am pretty sure that they are not same. But enlighten me.