Statements (28)
Predicate | Object |
---|---|
gptkbp:instanceOf |
reinforcement learning algorithm
|
gptkbp:application |
control systems
robotics game playing |
gptkbp:category |
model-free methods
|
gptkbp:compatibleWith |
model of environment
|
gptkbp:convergesTo |
true value function under certain conditions
|
gptkbp:fullName |
Temporal Difference Learning with zero-step lookahead
|
gptkbp:hasSpecialCase |
gptkb:TD(λ)
temporal difference learning |
https://www.w3.org/2000/01/rdf-schema#label |
TD(0)
|
gptkbp:input |
state transitions
reward signal |
gptkbp:introduced |
gptkb:Richard_S._Sutton
|
gptkbp:introducedIn |
1988
|
gptkbp:learns |
state value function
|
gptkbp:output |
updated value function
|
gptkbp:parameter |
discount factor (γ)
learning rate (α) |
gptkbp:relatedTo |
gptkb:Monte_Carlo_methods
gptkb:SARSA gptkb:Q-learning |
gptkbp:updateRule |
V(s) ← V(s) + α [r + γ V(s') − V(s)]
|
gptkbp:usedIn |
gptkb:reinforcement_learning
|
gptkbp:uses |
bootstrapping
temporal difference error |
gptkbp:bfsParent |
gptkb:Temporal_Difference_Learning
|
gptkbp:bfsLayer |
7
|