Adam Optimizer

GPTKB entity

Statements (54)
Predicate Object
gptkbp:instanceOf mathematical optimization
gptkbp:advantage computationally efficient
little memory requirement
may converge to suboptimal solutions
sometimes generalizes worse than SGD
well suited for large data
well suited for parameters
gptkbp:category gradient-based optimization
stochastic optimization
gptkbp:citation gptkb:arXiv:1412.6980
gptkbp:defaultBeta1 0.9
gptkbp:defaultBeta2 0.999
gptkbp:defaultEpsilon 1e-8
gptkbp:defaultLearningRate 0.001
gptkbp:fullName gptkb:Adaptive_Moment_Estimation
https://www.w3.org/2000/01/rdf-schema#label Adam Optimizer
gptkbp:introduced gptkb:Diederik_P._Kingma
gptkb:Jimmy_Ba
gptkbp:introducedIn 2014
gptkbp:openSource gptkb:TensorFlow
gptkb:Chainer
gptkb:MindSpore
gptkb:PaddlePaddle
gptkb:Keras
gptkb:MXNet
gptkb:CNTK
gptkb:Caffe
gptkb:Theano
gptkb:FastAI
gptkb:DL4J
gptkb:Flux.jl
gptkb:JAX
gptkb:PyTorch
gptkb:Scikit-learn
gptkb:ONNX
gptkbp:parameter learning rate
beta1
beta2
epsilon
gptkbp:popularFor gptkb:TensorFlow
gptkb:Keras
gptkb:PyTorch
gptkbp:relatedTo gptkb:AdaGrad
gptkb:RMSProp
SGD
gptkbp:updateRule bias-corrected estimates
element-wise parameter updates
uses first and second moment estimates
gptkbp:usedIn gptkb:machine_learning
deep learning
gptkbp:uses momentum
adaptive learning rates
gptkbp:bfsParent gptkb:Training_Recurrent_Neural_Networks
gptkbp:bfsLayer 6