DL optimizer

https://medium.com/%E9%9B%9E%E9%9B%9E%E8%88%87%E5%85%94%E5%85%94%E7%9A%84%E5%B7%A5%E7%A8%8B%E4%B8%96%E7%95%8C/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92ml-note-sgd-momentum-adagrad-adam-optimizer-f20568c968db

http://ruder.io/optimizing-gradient-descent/

DL optimizer

Stochastic gradient decent

use gradient

W = weight
L = loss
η = learning rate

Momentum (speed up or down base on previous gradient)

simulate particle motion
speed up on the same direction, slow down when changing direction.

$V_t$ is related to the last direction.

AdaGrad (adaptive learning rate)

adjust learning rate η base on previous gradient
early stage = small n, high learning speed
later stage = big n, low learning speed

n = sum(square(all previous gradients))

RMSprop

n = RSM(all previous gradients)

Adam

momentum + AdaGrad

Previousone-line

Last updated 5 years ago

Was this helpful?