Q-learning algorithm

GPTKB entity

Statements (27)
Predicate Object
gptkbp:instanceOf reinforcement learning algorithm
gptkbp:application control systems
resource management
game playing
autonomous navigation
gptkbp:citation gptkb:Watkins,_C.J.C.H._(1989)._Learning_from_Delayed_Rewards._PhD_thesis,_University_of_Cambridge.
gptkbp:combines deep learning
gptkbp:developedBy gptkb:Christopher_Watkins
gptkbp:goal find optimal action-selection policy
https://www.w3.org/2000/01/rdf-schema#label Q-learning algorithm
gptkbp:introducedIn 1989
gptkbp:learns Q-values
gptkbp:parameter discount factor
learning rate
exploration rate
gptkbp:Q-valuesRepresent expected utility of actions
gptkbp:relatedTo gptkb:SARSA
temporal difference learning
gptkbp:type off-policy
model-free
gptkbp:updateRule Q(s,a) ← Q(s,a) + α [r + γ max Q(s',a') - Q(s,a)]
gptkbp:usedIn gptkb:artificial_intelligence
gptkb:machine_learning
robotics
gptkbp:variant gptkb:Deep_Q-Network_(DQN)
gptkbp:bfsParent gptkb:Christopher_J.C.H._Watkins
gptkbp:bfsLayer 7