Statements (25)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:address |
overestimation bias in actor-critic methods
|
| gptkbp:application |
robotics
continuous control reinforcement learning benchmarks |
| gptkbp:author |
Scott Fujimoto, Herke van Hoof, David Meger
|
| gptkbp:basedOn |
gptkb:Deep_Deterministic_Policy_Gradient
|
| gptkbp:citation |
2018
Addressing Function Approximation Error in Actor-Critic Methods |
| gptkbp:fullName |
gptkb:Twin_Delayed_Deep_Deterministic_Policy_Gradient
|
| gptkbp:introduced |
gptkb:David_Meger
gptkb:Herke_van_Hoof gptkb:Scott_Fujimoto |
| gptkbp:introducedIn |
2018
|
| gptkbp:openSource |
gptkb:Stable_Baselines
gptkb:OpenAI_Spinning_Up gptkb:rlkit |
| gptkbp:publishedIn |
gptkb:International_Conference_on_Learning_Representations_(ICLR)_2018
|
| gptkbp:technique |
delayed policy updates
target policy smoothing twin Q-networks |
| gptkbp:bfsParent |
gptkb:Actor-Critic
gptkb:Stable_Baselines |
| gptkbp:bfsLayer |
6
|
| https://www.w3.org/2000/01/rdf-schema#label |
TD3
|