Statements (45)
Predicate | Object |
---|---|
gptkbp:instanceOf |
mathematical optimization
|
gptkbp:advantage |
computationally efficient
little memory requirement well suited for large data well suited for parameters |
gptkbp:category |
gradient-based optimization
stochastic optimization |
gptkbp:citation |
high
|
gptkbp:commonIn |
neural networks
|
gptkbp:defaultBeta1 |
0.9
|
gptkbp:defaultBeta2 |
0.999
|
gptkbp:defaultEpsilon |
1e-8
|
gptkbp:defaultLearningRate |
0.001
|
gptkbp:fullName |
gptkb:Adaptive_Moment_Estimation
|
https://www.w3.org/2000/01/rdf-schema#label |
Adam optimizer
|
gptkbp:implementedIn |
gptkb:TensorFlow
gptkb:Keras gptkb:JAX gptkb:PyTorch |
gptkbp:introduced |
gptkb:Diederik_P._Kingma
gptkb:Jimmy_Ba |
gptkbp:introducedIn |
2014
|
gptkbp:limitation |
may converge to suboptimal solutions
sensitive to hyperparameters sometimes generalizes worse than SGD |
gptkbp:openSource |
yes
|
gptkbp:parameter |
learning rate
beta1 beta2 epsilon |
gptkbp:publishedIn |
gptkb:arXiv:1412.6980
|
gptkbp:relatedTo |
gptkb:AdaGrad
gptkb:Momentum_optimizer gptkb:RMSProp SGD |
gptkbp:updateRule |
parameter update based on moving averages of gradient and squared gradient
uses bias-corrected first and second moment estimates |
gptkbp:usedIn |
gptkb:machine_learning
deep learning |
gptkbp:uses |
momentum
exponentially decaying averages of past squared gradients adaptive learning rates exponentially decaying averages of past gradients |
gptkbp:bfsParent |
gptkb:machine_learning
|
gptkbp:bfsLayer |
4
|