Trust Region Policy Optimization
GPTKB entity
Statements (49)
Predicate | Object |
---|---|
gptkbp:instanceOf |
algorithm
|
gptkbp:appliesTo |
reinforcement learning
|
gptkbp:developedBy |
gptkb:John_Schulman
|
https://www.w3.org/2000/01/rdf-schema#label |
Trust Region Policy Optimization
|
gptkbp:improves |
policy gradient methods
|
gptkbp:isAvenueFor |
continuous action spaces
discrete action spaces |
gptkbp:isChallengedBy |
high-dimensional spaces
local optima non-stationary environments |
gptkbp:isEvaluatedBy |
real-world applications
stability simulated environments convergence rate sample efficiency robotic tasks Atari_games |
gptkbp:isExploredIn |
academic research
industry applications AI conferences machine learning workshops reinforcement learning symposiums |
gptkbp:isInfluencedBy |
trust region methods
natural gradient methods |
gptkbp:isLocatedIn |
gptkb:PyTorch
TensorFlow |
gptkbp:isPartOf |
gptkb:REINFORCE
gptkb:DDPG policy optimization family TRPO_variants |
gptkbp:isRelatedTo |
actor-critic methods
policy optimization |
gptkbp:isSupportedBy |
empirical results
theoretical guarantees |
gptkbp:isUsedFor |
improving performance
exploration strategies training agents |
gptkbp:isUsedIn |
robotics
game playing |
gptkbp:isVisitedBy |
robust learning
sample-efficient learning scalable learning |
gptkbp:maxRange |
expected reward
|
gptkbp:performance |
policy parameters
|
gptkbp:publishedIn |
2015
|
gptkbp:reduces |
KL divergence
|
gptkbp:relatedTo |
Proximal Policy Optimization
|
gptkbp:requires |
second-order information
|
gptkbp:uses |
trust region methods
|