gptkbp:instanceOf
|
gptkb:algorithm
|
gptkbp:approach
|
trust region method
|
gptkbp:arXivID
|
1502.05477
|
gptkbp:author
|
gptkb:Michael_Jordan
gptkb:John_Schulman
gptkb:Pieter_Abbeel
gptkb:Sergey_Levine
gptkb:Philipp_Moritz
|
gptkbp:citation
|
high
|
gptkbp:field
|
gptkb:reinforcement_learning
|
gptkbp:fullName
|
gptkb:Trust_Region_Policy_Optimization
|
https://www.w3.org/2000/01/rdf-schema#label
|
TRPO
|
gptkbp:influenced
|
PPO
modern RL algorithms
|
gptkbp:input
|
gptkb:state_order
|
gptkbp:introduced
|
gptkb:John_Schulman
|
gptkbp:introducedIn
|
2015
|
gptkbp:notablePublication
|
gptkb:Trust_Region_Policy_Optimization
|
gptkbp:objective
|
maximize expected reward
|
gptkbp:openSource
|
gptkb:OpenAI_Baselines
gptkb:Stable_Baselines
rllab
|
gptkbp:optimizedFor
|
conjugate gradient
|
gptkbp:output
|
gptkb:action
|
gptkbp:prohibits
|
KL-divergence constraint
|
gptkbp:publishedIn
|
gptkb:ICML_2015
|
gptkbp:purpose
|
policy optimization
|
gptkbp:relatedTo
|
gptkb:Actor-Critic
PPO
REINFORCE
|
gptkbp:type
|
monotonic policy improvement
|
gptkbp:url
|
https://arxiv.org/abs/1502.05477
|
gptkbp:usedFor
|
robotics
games
continuous control
|
gptkbp:bfsParent
|
gptkb:OpenAI_Baselines
|
gptkbp:bfsLayer
|
6
|