gptkbp:instanceOf
|
Reinforcement Learning Algorithm
|
gptkbp:application
|
gptkb:robot
Game Playing
Autonomous Control
|
gptkbp:canBe
|
Deterministic
Stochastic
|
gptkbp:category
|
Model-Free Methods
On-Policy Methods
|
gptkbp:compatibleWith
|
Value Function
|
gptkbp:firstPublished
|
1999
|
gptkbp:form
|
Expected Return Gradient
|
https://www.w3.org/2000/01/rdf-schema#label
|
Policy Gradient
|
gptkbp:implementedIn
|
gptkb:TensorFlow
gptkb:OpenAI_Baselines
gptkb:PyTorch
|
gptkbp:introduced
|
gptkb:Richard_S._Sutton
|
gptkbp:learns
|
Parameterization of Policy
|
gptkbp:limitation
|
High Variance
|
gptkbp:objective
|
Maximize Expected Reward
|
gptkbp:optimizedFor
|
Policy Function
|
gptkbp:referencePaper
|
gptkb:Proximal_Policy_Optimization_Algorithms
gptkb:Trust_Region_Policy_Optimization
gptkb:Deterministic_Policy_Gradient_Algorithms
gptkb:Policy_Gradient_Methods_for_Reinforcement_Learning_with_Function_Approximation
gptkb:Soft_Actor-Critic:_Off-Policy_Maximum_Entropy_Deep_Reinforcement_Learning_with_a_Stochastic_Actor
|
gptkbp:relatedConcept
|
gptkb:Temporal_Difference_Learning
gptkb:Q-Learning
Monte Carlo Methods
Value-Based Methods
|
gptkbp:relatedTo
|
gptkb:REINFORCE_Algorithm
Actor-Critic Methods
Deep Reinforcement Learning
|
gptkbp:requires
|
Stochastic Policy
|
gptkbp:solutionToLimitation
|
Baseline Subtraction
|
gptkbp:updated
|
Policy Parameters
|
gptkbp:updateRule
|
Gradient Ascent
|
gptkbp:usedIn
|
gptkb:Machine_Learning
gptkb:artificial_intelligence
|
gptkbp:variant
|
gptkb:Natural_Policy_Gradient
gptkb:Trust_Region_Policy_Optimization
gptkb:Soft_Actor-Critic
gptkb:Proximal_Policy_Optimization
Deterministic Policy Gradient
|
gptkbp:bfsParent
|
gptkb:Reinforcement_Learning
|
gptkbp:bfsLayer
|
7
|