Trust Region Policy Optimization

GPTKB entity

Statements (34)
Predicate Object
gptkbp:instanceOf reinforcement learning algorithm
gptkbp:abbreviation gptkb:TRPO
gptkbp:author John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel
gptkbp:category deep reinforcement learning
gptkbp:citation gptkb:Trust_Region_Policy_Optimization
high
gptkbp:contrastsWith gptkb:Proximal_Policy_Optimization
vanilla policy gradient
gptkbp:field gptkb:artificial_intelligence
gptkb:machine_learning
gptkb:reinforcement_learning
gptkbp:goal improve policy optimization stability
https://www.w3.org/2000/01/rdf-schema#label Trust Region Policy Optimization
gptkbp:inspiredBy natural policy gradient
gptkbp:introduced gptkb:John_Schulman
gptkbp:introducedIn 2015
gptkbp:objective maximize expected reward
gptkbp:openSource gptkb:OpenAI_Baselines
gptkb:Stable_Baselines
rllab
gptkbp:optimizationConstraint KL-divergence constraint
gptkbp:publishedIn arXiv preprint arXiv:1502.05477
gptkbp:relatedTo gptkb:Deep_Deterministic_Policy_Gradient
gptkb:Proximal_Policy_Optimization
Actor-Critic methods
gptkbp:type policy gradient method
gptkbp:updateRule constrained optimization
gptkbp:usedFor robotics
games
continuous control tasks
gptkbp:uses trust region methods
gptkbp:bfsParent gptkb:TRPO
gptkb:Trust_Region_Policy_Optimization_(TRPO)
gptkbp:bfsLayer 7