Statements (22)
Predicate | Object |
---|---|
gptkbp:instanceOf |
reinforcement learning algorithm
|
gptkbp:category |
gptkb:artificial_intelligence
gptkb:machine_learning |
https://www.w3.org/2000/01/rdf-schema#label |
REINFORCE algorithm
|
gptkbp:improves |
baseline subtraction
|
gptkbp:input |
gptkb:public_policy
reward signal |
gptkbp:introduced |
gptkb:Ronald_J._Williams
|
gptkbp:introducedIn |
1992
|
gptkbp:limitation |
high variance in gradient estimates
|
gptkbp:objective |
maximize expected reward
|
gptkbp:output |
updated policy
|
gptkbp:publishedIn |
gptkb:Simple_statistical_gradient-following_algorithms_for_connectionist_reinforcement_learning
|
gptkbp:relatedTo |
policy gradient methods
stochastic gradient ascent |
gptkbp:type |
on-policy
model-free |
gptkbp:updateRule |
gradient ascent
|
gptkbp:usedFor |
gptkb:reinforcement_learning
policy gradient estimation |
gptkbp:bfsParent |
gptkb:Ronald_J._Williams
|
gptkbp:bfsLayer |
6
|