Statements (18)
Predicate | Object |
---|---|
gptkbp:instanceOf |
Reinforcement Learning Concept
|
gptkbp:appliesTo |
Deterministic Environments
Stochastic Environments |
gptkbp:describes |
Process of improving a policy based on value function
|
gptkbp:form |
Bellman Expectation Equation
|
gptkbp:goal |
Find optimal policy
|
https://www.w3.org/2000/01/rdf-schema#label |
Policy Improvement
|
gptkbp:improves |
Policy Performance
|
gptkbp:introduced |
gptkb:Richard_Bellman
|
gptkbp:relatedTo |
Policy Iteration
Value Iteration |
gptkbp:requires |
gptkb:Policy_Evaluation
|
gptkbp:step |
Policy Iteration Algorithm
|
gptkbp:usedIn |
gptkb:Dynamic_Programming
gptkb:Markov_Decision_Process Reinforcement Learning Algorithms |
gptkbp:bfsParent |
gptkb:Temporal_Difference_Learning
|
gptkbp:bfsLayer |
7
|