Statements (26)
Predicate | Object |
---|---|
gptkbp:instanceOf |
reinforcement learning algorithm
|
gptkbp:appliesTo |
robotics
game playing control tasks |
gptkbp:category |
gptkb:Monte_Carlo_method
|
gptkbp:citation |
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
|
https://www.w3.org/2000/01/rdf-schema#label |
REINFORCE Algorithm
|
gptkbp:improves |
baseline subtraction
|
gptkbp:introduced |
gptkb:Ronald_J._Williams
|
gptkbp:introducedIn |
1992
|
gptkbp:limitation |
high variance in gradient estimates
|
gptkbp:objective |
maximizing expected cumulative reward
|
gptkbp:optimizedFor |
expected return
|
gptkbp:relatedTo |
policy gradient methods
actor-critic methods |
gptkbp:requires |
differentiable policy
sampling trajectories |
gptkbp:type |
on-policy
model-free |
gptkbp:updated |
policy parameters
|
gptkbp:updateRule |
gradient ascent
|
gptkbp:usedFor |
policy gradient estimation
|
gptkbp:usedIn |
gptkb:artificial_intelligence
gptkb:machine_learning |
gptkbp:bfsParent |
gptkb:Policy_Gradient
|
gptkbp:bfsLayer |
8
|