Q-learning algorithm

URI: https://gptkb.org/entity/Q-learning_algorithm

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:reinforcement_learning_algorithm
gptkbp:application	control systems resource management game playing autonomous navigation
gptkbp:citation	gptkb:Watkins,_C.J.C.H._(1989)._Learning_from_Delayed_Rewards._PhD_thesis,_University_of_Cambridge.
gptkbp:combines	deep learning
gptkbp:developedBy	gptkb:Christopher_Watkins
gptkbp:goal	find optimal action-selection policy
gptkbp:introducedIn	1989
gptkbp:learns	Q-values
gptkbp:parameter	discount factor learning rate exploration rate
gptkbp:Q-valuesRepresent	expected utility of actions
gptkbp:relatedTo	gptkb:SARSA temporal difference learning
gptkbp:type	off-policy model-free
gptkbp:updateRule	Q(s,a) ← Q(s,a) + α [r + γ max Q(s',a') - Q(s,a)]
gptkbp:usedIn	gptkb:artificial_intelligence gptkb:machine_learning robotics
gptkbp:variant	gptkb:Deep_Q-Network_(DQN)
gptkbp:bfsParent	gptkb:Christopher_J.C.H._Watkins
gptkbp:bfsLayer	7
https://www.w3.org/2000/01/rdf-schema#label	Q-learning algorithm