gptkbp:instanceOf
|
mathematical optimization
|
gptkbp:advantage
|
computationally efficient
little memory requirement
may converge to suboptimal solutions
sometimes generalizes worse than SGD
well suited for large data
well suited for parameters
|
gptkbp:category
|
gradient-based optimization
stochastic optimization
|
gptkbp:citation
|
gptkb:arXiv:1412.6980
|
gptkbp:defaultBeta1
|
0.9
|
gptkbp:defaultBeta2
|
0.999
|
gptkbp:defaultEpsilon
|
1e-8
|
gptkbp:defaultLearningRate
|
0.001
|
gptkbp:fullName
|
gptkb:Adaptive_Moment_Estimation
|
https://www.w3.org/2000/01/rdf-schema#label
|
Adam Optimizer
|
gptkbp:introduced
|
gptkb:Diederik_P._Kingma
gptkb:Jimmy_Ba
|
gptkbp:introducedIn
|
2014
|
gptkbp:openSource
|
gptkb:TensorFlow
gptkb:Chainer
gptkb:MindSpore
gptkb:PaddlePaddle
gptkb:Keras
gptkb:MXNet
gptkb:CNTK
gptkb:Caffe
gptkb:Theano
gptkb:FastAI
gptkb:DL4J
gptkb:Flux.jl
gptkb:JAX
gptkb:PyTorch
gptkb:Scikit-learn
gptkb:ONNX
|
gptkbp:parameter
|
learning rate
beta1
beta2
epsilon
|
gptkbp:popularFor
|
gptkb:TensorFlow
gptkb:Keras
gptkb:PyTorch
|
gptkbp:relatedTo
|
gptkb:AdaGrad
gptkb:RMSProp
SGD
|
gptkbp:updateRule
|
bias-corrected estimates
element-wise parameter updates
uses first and second moment estimates
|
gptkbp:usedIn
|
gptkb:machine_learning
deep learning
|
gptkbp:uses
|
momentum
adaptive learning rates
|
gptkbp:bfsParent
|
gptkb:Training_Recurrent_Neural_Networks
|
gptkbp:bfsLayer
|
6
|