Statements (35)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:mathematical_optimization
|
| gptkbp:advantage |
faster convergence on large datasets
may not converge to exact minimum noisy updates |
| gptkbp:alsoKnownAs |
SGD
|
| gptkbp:alternativeTo |
mini-batch gradient descent
batch gradient descent |
| gptkbp:category |
first-order optimization method
|
| gptkbp:commonIn |
linear regression
logistic regression neural network training |
| gptkbp:dependsOn |
data shuffling
learning rate schedule loss surface |
| gptkbp:hyperparameter |
learning rate
batch size |
| gptkbp:improves |
gptkb:Nesterov_accelerated_gradient
momentum weight decay learning rate schedules |
| gptkbp:introducedIn |
1951
|
| gptkbp:isFoundationFor |
gptkb:Adam_optimizer
gptkb:RMSprop Adagrad |
| gptkbp:proposedBy |
gptkb:Herbert_Robbins
gptkb:Sutton_Monro |
| gptkbp:requires |
differentiable objective function
|
| gptkbp:updateRule |
parameter update using single or few samples
|
| gptkbp:usedFor |
minimizing loss functions
|
| gptkbp:usedIn |
gptkb:machine_learning
deep learning |
| gptkbp:variant |
gradient descent
|
| gptkbp:bfsParent |
gptkb:Stochastic_Gradient_Langevin_Dynamics
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
Stochastic gradient descent
|