DL optimizer
Last updated
Last updated
http://ruder.io/optimizing-gradient-descent/
use gradient
simulate particle motion
speed up on the same direction, slow down when changing direction.
is related to the last direction.
adjust learning rate η base on previous gradient
early stage = small n, high learning speed
later stage = big n, low learning speed
momentum + AdaGrad