Policy Gradient

GPTKB entity

Statements (45)
Predicate Object
gptkbp:instanceOf Reinforcement Learning Algorithm
gptkbp:application gptkb:robot
Game Playing
Autonomous Control
gptkbp:canBe Deterministic
Stochastic
gptkbp:category Model-Free Methods
On-Policy Methods
gptkbp:compatibleWith Value Function
gptkbp:firstPublished 1999
gptkbp:form Expected Return Gradient
https://www.w3.org/2000/01/rdf-schema#label Policy Gradient
gptkbp:implementedIn gptkb:TensorFlow
gptkb:OpenAI_Baselines
gptkb:PyTorch
gptkbp:introduced gptkb:Richard_S._Sutton
gptkbp:learns Parameterization of Policy
gptkbp:limitation High Variance
gptkbp:objective Maximize Expected Reward
gptkbp:optimizedFor Policy Function
gptkbp:referencePaper gptkb:Proximal_Policy_Optimization_Algorithms
gptkb:Trust_Region_Policy_Optimization
gptkb:Deterministic_Policy_Gradient_Algorithms
gptkb:Policy_Gradient_Methods_for_Reinforcement_Learning_with_Function_Approximation
gptkb:Soft_Actor-Critic:_Off-Policy_Maximum_Entropy_Deep_Reinforcement_Learning_with_a_Stochastic_Actor
gptkbp:relatedConcept gptkb:Temporal_Difference_Learning
gptkb:Q-Learning
Monte Carlo Methods
Value-Based Methods
gptkbp:relatedTo gptkb:REINFORCE_Algorithm
Actor-Critic Methods
Deep Reinforcement Learning
gptkbp:requires Stochastic Policy
gptkbp:solutionToLimitation Baseline Subtraction
gptkbp:updated Policy Parameters
gptkbp:updateRule Gradient Ascent
gptkbp:usedIn gptkb:Machine_Learning
gptkb:artificial_intelligence
gptkbp:variant gptkb:Natural_Policy_Gradient
gptkb:Trust_Region_Policy_Optimization
gptkb:Soft_Actor-Critic
gptkb:Proximal_Policy_Optimization
Deterministic Policy Gradient
gptkbp:bfsParent gptkb:Reinforcement_Learning
gptkbp:bfsLayer 7