Statements (45)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:mathematical_optimization
|
| gptkbp:advantage |
computationally efficient
little memory requirement well suited for large data well suited for parameters |
| gptkbp:category |
gradient-based optimization
stochastic optimization |
| gptkbp:citation |
high
|
| gptkbp:commonIn |
neural networks
|
| gptkbp:defaultBeta1 |
0.9
|
| gptkbp:defaultBeta2 |
0.999
|
| gptkbp:defaultEpsilon |
1e-8
|
| gptkbp:defaultLearningRate |
0.001
|
| gptkbp:fullName |
gptkb:Adaptive_Moment_Estimation
|
| gptkbp:implementedIn |
gptkb:TensorFlow
gptkb:Keras gptkb:JAX gptkb:PyTorch |
| gptkbp:introduced |
gptkb:Diederik_P._Kingma
gptkb:Jimmy_Ba |
| gptkbp:introducedIn |
2014
|
| gptkbp:limitation |
may converge to suboptimal solutions
sensitive to hyperparameters sometimes generalizes worse than SGD |
| gptkbp:openSource |
yes
|
| gptkbp:parameter |
learning rate
beta1 beta2 epsilon |
| gptkbp:publishedIn |
gptkb:arXiv:1412.6980
|
| gptkbp:relatedTo |
gptkb:AdaGrad
gptkb:Momentum_optimizer gptkb:RMSProp SGD |
| gptkbp:updateRule |
parameter update based on moving averages of gradient and squared gradient
uses bias-corrected first and second moment estimates |
| gptkbp:usedIn |
gptkb:machine_learning
deep learning |
| gptkbp:uses |
momentum
exponentially decaying averages of past squared gradients adaptive learning rates exponentially decaying averages of past gradients |
| gptkbp:bfsParent |
gptkb:machine_learning
|
| gptkbp:bfsLayer |
4
|
| https://www.w3.org/2000/01/rdf-schema#label |
Adam optimizer
|