DL optimizer

https://medium.com/%E9%9B%9E%E9%9B%9E%E8%88%87%E5%85%94%E5%85%94%E7%9A%84%E5%B7%A5%E7%A8%8B%E4%B8%96%E7%95%8C/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92ml-note-sgd-momentum-adagrad-adam-optimizer-f20568c968db

http://ruder.io/optimizing-gradient-descent/

DL optimizer

Stochastic gradient decent

  • use gradient

W = weight
L = loss
η = learning rate

Momentum (speed up or down base on previous gradient)

  • simulate particle motion

  • speed up on the same direction, slow down when changing direction.

VtV_t is related to the last direction.

AdaGrad (adaptive learning rate)

  • adjust learning rate η base on previous gradient

  • early stage = small n, high learning speed

  • later stage = big n, low learning speed

n = sum(square(all previous gradients))

RMSprop

n = RSM(all previous gradients)

Adam

  • momentum + AdaGrad

Last updated

Was this helpful?