Statements (23)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:appliesTo |
control systems
robotics game playing |
| gptkbp:canBe |
off-policy
on-policy |
| gptkbp:hasComponent |
gptkb:actor
gptkb:literary_criticism |
| gptkbp:hasVariant |
gptkb:A2C
gptkb:A3C gptkb:DDPG gptkb:TD3 gptkb:SAC |
| gptkbp:introducedIn |
1980s
|
| gptkbp:learnsPolicy |
gptkb:actor
|
| gptkbp:learnsValueFunction |
gptkb:literary_criticism
|
| gptkbp:type |
temporal difference learning
policy gradient method |
| gptkbp:usedIn |
gptkb:artificial_intelligence
gptkb:machine_learning |
| gptkbp:bfsParent |
gptkb:reinforcement_learning
|
| gptkbp:bfsLayer |
5
|
| https://www.w3.org/2000/01/rdf-schema#label |
Actor-Critic
|