Statements (27)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:abbreviation |
gptkb:A2C
|
| gptkbp:actorRole |
updates policy
|
| gptkbp:appliesTo |
gptkb:Atari_games
robotics continuous control tasks |
| gptkbp:category |
on-policy algorithm
|
| gptkbp:component |
gptkb:actor
gptkb:literary_criticism |
| gptkbp:criticRole |
estimates value function
|
| gptkbp:improves |
actor-critic method
|
| gptkbp:introducedIn |
2016
|
| gptkbp:objective |
expected return
|
| gptkbp:purpose |
policy optimization
value estimation |
| gptkbp:relatedTo |
gptkb:A3C
policy gradient methods value-based methods |
| gptkbp:usedBy |
gptkb:OpenAI_Baselines
gptkb:Stable_Baselines |
| gptkbp:usedIn |
deep reinforcement learning
|
| gptkbp:uses |
temporal difference learning
advantage function stochastic policy |
| gptkbp:bfsParent |
gptkb:A2C
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Advantage Actor-Critic
|