Statements (26)
Predicate | Object |
---|---|
gptkbp:instanceOf |
reinforcement learning algorithm
|
gptkbp:appliesTo |
deep reinforcement learning
|
gptkbp:fullName |
Actor Critic using Kronecker-Factored Trust Region
|
https://www.w3.org/2000/01/rdf-schema#label |
ACKTR
|
gptkbp:improves |
sample efficiency
training stability |
gptkbp:introduced |
gptkb:Ilya_Sutskever
gptkb:Sergey_Levine gptkb:Shixiang_Gu gptkb:Timothy_Lillicrap Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic |
gptkbp:openSource |
gptkb:OpenAI_Baselines
gptkb:Stable_Baselines rl-baselines3-zoo |
gptkbp:optimizedFor |
policy parameters
value function parameters |
gptkbp:publicationYear |
2017
|
gptkbp:relatedTo |
gptkb:A2C
gptkb:A3C gptkb:TRPO K-FAC |
gptkbp:type |
actor-critic method
policy optimization algorithm |
gptkbp:uses |
Kronecker-factored approximation
|
gptkbp:bfsParent |
gptkb:OpenAI_Baselines
|
gptkbp:bfsLayer |
6
|