Statements (22)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:category |
gptkb:artificial_intelligence
gptkb:machine_learning |
| gptkbp:improves |
baseline subtraction
|
| gptkbp:input |
gptkb:public_policy
reward signal |
| gptkbp:introduced |
gptkb:Ronald_J._Williams
|
| gptkbp:introducedIn |
1992
|
| gptkbp:limitation |
high variance in gradient estimates
|
| gptkbp:objective |
maximize expected reward
|
| gptkbp:output |
updated policy
|
| gptkbp:publishedIn |
gptkb:Simple_statistical_gradient-following_algorithms_for_connectionist_reinforcement_learning
|
| gptkbp:relatedTo |
policy gradient methods
stochastic gradient ascent |
| gptkbp:type |
on-policy
model-free |
| gptkbp:updateRule |
gradient ascent
|
| gptkbp:usedFor |
gptkb:reinforcement_learning
policy gradient estimation |
| gptkbp:bfsParent |
gptkb:Ronald_J._Williams
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
REINFORCE algorithm
|