REINFORCE Algorithm

GPTKB entity

Statements (26)
Predicate Object
gptkbp:instanceOf reinforcement learning algorithm
gptkbp:appliesTo robotics
game playing
control tasks
gptkbp:category gptkb:Monte_Carlo_method
gptkbp:citation Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
https://www.w3.org/2000/01/rdf-schema#label REINFORCE Algorithm
gptkbp:improves baseline subtraction
gptkbp:introduced gptkb:Ronald_J._Williams
gptkbp:introducedIn 1992
gptkbp:limitation high variance in gradient estimates
gptkbp:objective maximizing expected cumulative reward
gptkbp:optimizedFor expected return
gptkbp:relatedTo policy gradient methods
actor-critic methods
gptkbp:requires differentiable policy
sampling trajectories
gptkbp:type on-policy
model-free
gptkbp:updated policy parameters
gptkbp:updateRule gradient ascent
gptkbp:usedFor policy gradient estimation
gptkbp:usedIn gptkb:artificial_intelligence
gptkb:machine_learning
gptkbp:bfsParent gptkb:Policy_Gradient
gptkbp:bfsLayer 8