W+= - Learning rate * dx
Stochastic Gradient Descent (SGD)
Momentum
m = b1 * m - Learning rate * dx
W += m
NAG
AdaGrad
v += (dx) ^ 2
W+= - Learning rate * dx / √ (V)
Adadelta
RMSProp
v = b1 *v + (1 -b1) * dx ^ 2
W+= - Learning rate * dx / √ (V)
Adam
m = b1 * m + (1 - b1) * dx
v =b2 * v + (1 - b2) * dx ^ 2
W+= - Learning rate * dx / √ (V)