Statements (25)
Predicate | Object |
---|---|
gptkbp:instanceOf |
reinforcement learning algorithm
|
gptkbp:address |
overestimation bias in actor-critic methods
|
gptkbp:application |
robotics
continuous control reinforcement learning benchmarks |
gptkbp:author |
Scott Fujimoto, Herke van Hoof, David Meger
|
gptkbp:basedOn |
gptkb:Deep_Deterministic_Policy_Gradient
|
gptkbp:citation |
2018
Addressing Function Approximation Error in Actor-Critic Methods |
gptkbp:fullName |
gptkb:Twin_Delayed_Deep_Deterministic_Policy_Gradient
|
https://www.w3.org/2000/01/rdf-schema#label |
TD3
|
gptkbp:introduced |
gptkb:David_Meger
gptkb:Herke_van_Hoof gptkb:Scott_Fujimoto |
gptkbp:introducedIn |
2018
|
gptkbp:openSource |
gptkb:Stable_Baselines
gptkb:OpenAI_Spinning_Up gptkb:rlkit |
gptkbp:publishedIn |
gptkb:International_Conference_on_Learning_Representations_(ICLR)_2018
|
gptkbp:technique |
delayed policy updates
target policy smoothing twin Q-networks |
gptkbp:bfsParent |
gptkb:Actor-Critic
gptkb:Stable_Baselines |
gptkbp:bfsLayer |
6
|