Statements (11)
Predicate | Object |
---|---|
gptkbp:instanceOf |
algorithm
|
gptkbp:appliesTo |
policy gradient methods
|
gptkbp:developedBy |
gptkb:Richard_S._Sutton
|
https://www.w3.org/2000/01/rdf-schema#label |
REINFORCE
|
gptkbp:improves |
policy evaluation
|
gptkbp:is |
stochastic
model-free |
gptkbp:isUsedFor |
continuous action spaces
|
gptkbp:performance |
expected reward
|
gptkbp:requires |
sample trajectories
|
gptkbp:usedIn |
reinforcement learning
|