Statements (18)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:Reinforcement_Learning_Concept
|
| gptkbp:appliesTo |
Deterministic Environments
Stochastic Environments |
| gptkbp:describes |
Process of improving a policy based on value function
|
| gptkbp:form |
Bellman Expectation Equation
|
| gptkbp:goal |
Find optimal policy
|
| gptkbp:improves |
Policy Performance
|
| gptkbp:introduced |
gptkb:Richard_Bellman
|
| gptkbp:relatedTo |
Policy Iteration
Value Iteration |
| gptkbp:requires |
gptkb:Policy_Evaluation
|
| gptkbp:step |
Policy Iteration Algorithm
|
| gptkbp:usedIn |
gptkb:Dynamic_Programming
gptkb:Markov_Decision_Process Reinforcement Learning Algorithms |
| gptkbp:bfsParent |
gptkb:Temporal_Difference_Learning
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Policy Improvement
|