Statements (22)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:algorithm
|
gptkbp:canBe |
approximate
exact |
gptkbp:category |
dynamic programming
|
gptkbp:complexity |
polynomial time (for finite MDPs)
|
gptkbp:convergesTo |
optimal policy
|
gptkbp:goal |
find optimal policy
|
gptkbp:hasVariant |
generalized policy iteration
modified policy iteration |
https://www.w3.org/2000/01/rdf-schema#label |
Policy iteration
|
gptkbp:input |
Markov chain
|
gptkbp:introduced |
gptkb:Richard_Bellman
|
gptkbp:output |
optimal policy
|
gptkbp:relatedTo |
value iteration
|
gptkbp:requires |
reward function
transition probabilities |
gptkbp:step |
policy evaluation
policy improvement |
gptkbp:usedIn |
gptkb:reinforcement_learning
Markov chain |
gptkbp:bfsParent |
gptkb:Partially_Observable_Markov_Decision_Process
|
gptkbp:bfsLayer |
7
|