Statements (35)
Predicate | Object |
---|---|
gptkbp:instanceOf |
mathematical optimization
|
gptkbp:advantage |
faster convergence on large datasets
may not converge to exact minimum noisy updates |
gptkbp:alsoKnownAs |
SGD
|
gptkbp:alternativeTo |
mini-batch gradient descent
batch gradient descent |
gptkbp:category |
first-order optimization method
|
gptkbp:commonIn |
linear regression
logistic regression neural network training |
gptkbp:dependsOn |
data shuffling
learning rate schedule loss surface |
https://www.w3.org/2000/01/rdf-schema#label |
Stochastic gradient descent
|
gptkbp:hyperparameter |
learning rate
batch size |
gptkbp:improves |
gptkb:Nesterov_accelerated_gradient
momentum weight decay learning rate schedules |
gptkbp:introducedIn |
1951
|
gptkbp:isFoundationFor |
gptkb:Adam_optimizer
gptkb:RMSprop Adagrad |
gptkbp:proposedBy |
gptkb:Herbert_Robbins
gptkb:Sutton_Monro |
gptkbp:requires |
differentiable objective function
|
gptkbp:updateRule |
parameter update using single or few samples
|
gptkbp:usedFor |
minimizing loss functions
|
gptkbp:usedIn |
gptkb:machine_learning
deep learning |
gptkbp:variant |
gradient descent
|
gptkbp:bfsParent |
gptkb:Stochastic_Gradient_Langevin_Dynamics
|
gptkbp:bfsLayer |
7
|