Statements (23)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:appliesTo |
control systems
robotics game playing |
| gptkbp:combines |
gptkb:Monte_Carlo_methods
dynamic programming |
| gptkbp:compatibleWith |
model of environment
|
| gptkbp:example |
gptkb:TD(λ)
gptkb:TD(0) |
| gptkbp:fullName |
gptkb:Temporal_Difference_learning
|
| gptkbp:hasConcept |
bootstrapping
temporal difference error |
| gptkbp:introduced |
gptkb:Richard_S._Sutton
|
| gptkbp:introducedIn |
1988
|
| gptkbp:learnsFrom |
raw experience
|
| gptkbp:relatedTo |
gptkb:SARSA
gptkb:Q-learning |
| gptkbp:updated |
after every step
value estimates |
| gptkbp:usedIn |
gptkb:reinforcement_learning
|
| gptkbp:bfsParent |
gptkb:Temporal_Difference_learning
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
TD learning
|