REINFORCE Algorithm

URI: https://gptkb.org/entity/REINFORCE_Algorithm

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:reinforcement_learning_algorithm
gptkbp:appliesTo	robotics game playing control tasks
gptkbp:category	gptkb:Monte_Carlo_method
gptkbp:citation	Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
gptkbp:improves	baseline subtraction
gptkbp:introduced	gptkb:Ronald_J._Williams
gptkbp:introducedIn	1992
gptkbp:limitation	high variance in gradient estimates
gptkbp:objective	maximizing expected cumulative reward
gptkbp:optimizedFor	expected return
gptkbp:relatedTo	policy gradient methods actor-critic methods
gptkbp:requires	differentiable policy sampling trajectories
gptkbp:type	on-policy model-free
gptkbp:updated	policy parameters
gptkbp:updateRule	gradient ascent
gptkbp:usedFor	policy gradient estimation
gptkbp:usedIn	gptkb:artificial_intelligence gptkb:machine_learning
gptkbp:bfsParent	gptkb:Policy_Gradient
gptkbp:bfsLayer	8
https://www.w3.org/2000/01/rdf-schema#label	REINFORCE Algorithm