Statements (22)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:algorithm
|
| gptkbp:canBe |
approximate
exact |
| gptkbp:category |
dynamic programming
|
| gptkbp:complexity |
polynomial time (for finite MDPs)
|
| gptkbp:convergesTo |
optimal policy
|
| gptkbp:goal |
find optimal policy
|
| gptkbp:hasVariant |
generalized policy iteration
modified policy iteration |
| gptkbp:input |
gptkb:Markov_chain
|
| gptkbp:introduced |
gptkb:Richard_Bellman
|
| gptkbp:output |
optimal policy
|
| gptkbp:relatedTo |
value iteration
|
| gptkbp:requires |
reward function
transition probabilities |
| gptkbp:step |
policy evaluation
policy improvement |
| gptkbp:usedIn |
gptkb:Markov_chain
gptkb:reinforcement_learning |
| gptkbp:bfsParent |
gptkb:Partially_Observable_Markov_Decision_Process
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Policy iteration
|