gptkbp:instanceOf
|
Reinforcement learning algorithm
|
gptkbp:application
|
gptkb:Autonomous_vehicles
gptkb:robot
Game playing
Resource management
|
gptkbp:category
|
Off-policy learning
|
gptkbp:citation
|
gptkb:Watkins,_C.J.C.H._(1989)._Learning_from_Delayed_Rewards._PhD_thesis,_University_of_Cambridge.
|
gptkbp:compatibleWith
|
Model of environment
|
gptkbp:convergesTo
|
Optimal policy
|
gptkbp:explorationStrategy
|
Epsilon-greedy
Softmax
|
gptkbp:field
|
gptkb:artificial_intelligence
Machine learning
Reinforcement learning
|
https://www.w3.org/2000/01/rdf-schema#label
|
Q-learning
|
gptkbp:influenced
|
Deep reinforcement learning
|
gptkbp:input
|
gptkb:action
gptkb:state_order
|
gptkbp:introduced
|
gptkb:Christopher_Watkins
|
gptkbp:introducedIn
|
1989
|
gptkbp:output
|
Q-value
|
gptkbp:relatedTo
|
gptkb:Deep_Q-Network
gptkb:SARSA
|
gptkbp:rewardSignal
|
Reinforcement signal
|
gptkbp:solvedBy
|
Markov chain
|
gptkbp:type
|
Model-free algorithm
|
gptkbp:updateRule
|
gptkb:Bellman_equation
|
gptkbp:uses
|
Q-value
|
gptkbp:bfsParent
|
gptkb:machine_learning
|
gptkbp:bfsLayer
|
4
|