Statements (28)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:application |
control systems
robotics game playing |
| gptkbp:category |
model-free methods
|
| gptkbp:compatibleWith |
model of environment
|
| gptkbp:convergesTo |
true value function under certain conditions
|
| gptkbp:fullName |
Temporal Difference Learning with zero-step lookahead
|
| gptkbp:hasSpecialCase |
gptkb:TD(λ)
temporal difference learning |
| gptkbp:input |
state transitions
reward signal |
| gptkbp:introduced |
gptkb:Richard_S._Sutton
|
| gptkbp:introducedIn |
1988
|
| gptkbp:learns |
state value function
|
| gptkbp:output |
updated value function
|
| gptkbp:parameter |
discount factor (γ)
learning rate (α) |
| gptkbp:relatedTo |
gptkb:Monte_Carlo_methods
gptkb:SARSA gptkb:Q-learning |
| gptkbp:updateRule |
V(s) ← V(s) + α [r + γ V(s') − V(s)]
|
| gptkbp:usedIn |
gptkb:reinforcement_learning
|
| gptkbp:uses |
bootstrapping
temporal difference error |
| gptkbp:bfsParent |
gptkb:Temporal_Difference_Learning
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
TD(0)
|