Trust Region Policy Optimization (TRPO)

GPTKB entity

Statements (33)
Predicate Object
gptkbp:instanceOf reinforcement learning algorithm
gptkbp:abbreviation gptkb:TRPO
gptkbp:application robotics
game playing
control tasks
gptkbp:author John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel
gptkbp:citation gptkb:Trust_Region_Policy_Optimization
gptkb:ICML
2015
gptkbp:designedFor policy optimization
https://www.w3.org/2000/01/rdf-schema#label Trust Region Policy Optimization (TRPO)
gptkbp:improves policy gradient methods
gptkbp:influenced gptkb:Soft_Actor-Critic_(SAC)
gptkb:Proximal_Policy_Optimization_(PPO)
modern policy gradient methods
gptkbp:input policy network
gptkbp:introduced gptkb:John_Schulman
gptkbp:introducedIn 2015
gptkbp:objective maximize expected reward
gptkbp:openSource gptkb:OpenAI_Baselines
gptkb:Stable_Baselines
gptkbp:optimizedFor constrained optimization
gptkbp:output improved policy network
gptkbp:prohibits trust region constraint
gptkbp:publishedIn gptkb:ICML_2015
gptkbp:relatedTo gptkb:Natural_Policy_Gradient
gptkb:Proximal_Policy_Optimization_(PPO)
gptkbp:supportsAlgorithm on-policy
gptkbp:type KL-divergence constraint
monotonic policy improvement
gptkbp:usedIn deep reinforcement learning
gptkbp:bfsParent gptkb:John_Schulman
gptkbp:bfsLayer 6