gptkbp:instanceOf
|
Reinforcement learning algorithm
|
gptkbp:application
|
gptkb:robot
Game playing
Autonomous control
|
gptkbp:category
|
Temporal difference learning
|
gptkbp:compatibleWith
|
Model of environment
|
gptkbp:convergesTo
|
Optimal policy (under certain conditions)
|
gptkbp:explorationStrategy
|
Epsilon-greedy
|
gptkbp:field
|
gptkb:artificial_intelligence
Machine learning
|
gptkbp:form
|
Markov chain
|
gptkbp:goal
|
Learn optimal action-selection policy
|
https://www.w3.org/2000/01/rdf-schema#label
|
Q-Learning
|
gptkbp:influenced
|
Deep Q-Learning
|
gptkbp:input
|
gptkb:action
gptkb:state_order
|
gptkbp:introduced
|
gptkb:Christopher_Watkins
|
gptkbp:introducedIn
|
1989
|
gptkbp:output
|
Q-value
|
gptkbp:relatedTo
|
gptkb:Deep_Q-Network
gptkb:SARSA
|
gptkbp:rewardSignal
|
Reinforcement signal
|
gptkbp:type
|
Model-free algorithm
Off-policy algorithm
|
gptkbp:updateParameter
|
Discount factor
Learning rate
|
gptkbp:updateRule
|
gptkb:Bellman_equation
|
gptkbp:usedIn
|
Resource management
Autonomous navigation
Atari game agents
|
gptkbp:uses
|
Q-values
|
gptkbp:bfsParent
|
gptkb:Temporal_Difference_Learning
|
gptkbp:bfsLayer
|
7
|