Statements (16)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:algorithm
|
| gptkbp:alsoKnownAs |
gptkb:R-N_algorithm
|
| gptkbp:field |
gptkb:reinforcement_learning
|
| gptkbp:proposedBy |
gptkb:G.A._Rummery
gptkb:M._Niranjan |
| gptkbp:publishedIn |
On-line Q-learning using connectionist systems
|
| gptkbp:relatedTo |
gptkb:SARSA
gptkb:Q-learning |
| gptkbp:type |
gptkb:reinforcement_learning_algorithm
on-policy algorithm |
| gptkbp:usedFor |
policy evaluation
temporal difference learning |
| gptkbp:yearProposed |
1994
|
| gptkbp:bfsParent |
gptkb:SARSA
|
| gptkbp:bfsLayer |
6
|
| https://www.w3.org/2000/01/rdf-schema#label |
Rummery and Niranjan
|