Statements (37)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:model
mathematical optimization |
gptkbp:advantage |
computationally efficient
may converge to suboptimal solutions can have poor generalization in some cases requires little memory well suited for problems with large data well suited for problems with many parameters |
gptkbp:category |
gradient-based optimization
stochastic optimization |
gptkbp:commonIn |
computer vision
deep learning natural language processing |
gptkbp:defaultBeta1 |
0.9
|
gptkbp:defaultBeta2 |
0.999
|
gptkbp:defaultEpsilon |
1e-8
|
gptkbp:defaultLearningRate |
0.001
|
gptkbp:fullName |
gptkb:Adaptive_Moment_Estimation
|
https://www.w3.org/2000/01/rdf-schema#label |
Adam optimization algorithm
|
gptkbp:introduced |
gptkb:Diederik_P._Kingma
gptkb:Jimmy_Ba |
gptkbp:introducedIn |
2014
|
gptkbp:parameter |
learning rate
beta1 beta2 epsilon |
gptkbp:publishedIn |
gptkb:arXiv:1412.6980
|
gptkbp:relatedTo |
gptkb:AdaGrad
gptkb:RMSProp SGD |
gptkbp:usedFor |
training neural networks
|
gptkbp:uses |
momentum
exponentially decaying averages of past squared gradients adaptive learning rates exponentially decaying averages of past gradients |
gptkbp:bfsParent |
gptkb:Adam:_A_Method_for_Stochastic_Optimization
|
gptkbp:bfsLayer |
7
|