Statements (26)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:appliesTo |
robotics
game playing control tasks |
| gptkbp:category |
gptkb:Monte_Carlo_method
|
| gptkbp:citation |
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
|
| gptkbp:improves |
baseline subtraction
|
| gptkbp:introduced |
gptkb:Ronald_J._Williams
|
| gptkbp:introducedIn |
1992
|
| gptkbp:limitation |
high variance in gradient estimates
|
| gptkbp:objective |
maximizing expected cumulative reward
|
| gptkbp:optimizedFor |
expected return
|
| gptkbp:relatedTo |
policy gradient methods
actor-critic methods |
| gptkbp:requires |
differentiable policy
sampling trajectories |
| gptkbp:type |
on-policy
model-free |
| gptkbp:updated |
policy parameters
|
| gptkbp:updateRule |
gradient ascent
|
| gptkbp:usedFor |
policy gradient estimation
|
| gptkbp:usedIn |
gptkb:artificial_intelligence
gptkb:machine_learning |
| gptkbp:bfsParent |
gptkb:Policy_Gradient
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
REINFORCE Algorithm
|