SARSA

GPTKB entity

Statements (36)
Predicate Object
gptkbp:instanceOf reinforcement learning algorithm
gptkbp:actionVariable a
gptkbp:appliesTo gptkb:game_AI
autonomous systems
robotics
gptkbp:canBe tabular
function approximation
gptkbp:category on-policy algorithm
gptkbp:citation Rummery, G.A. & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.
gptkbp:convergesTo optimal policy (under certain conditions)
gptkbp:differential SARSA is on-policy, Q-learning is off-policy
gptkbp:distinctFrom gptkb:Q-learning
gptkbp:explores environment
gptkbp:firstDescribed gptkb:Rummery_and_Niranjan
1996
gptkbp:fullName gptkb:State-Action-Reward-State-Action
https://www.w3.org/2000/01/rdf-schema#label SARSA
gptkbp:learns action-value function
gptkbp:parameter gptkb:public_policy
discount factor
learning rate
exploration rate
gptkbp:policy ε-greedy
gptkbp:relatedTo gptkb:Q-learning
gptkbp:rewardSignal r
gptkbp:stateVariable s
gptkbp:successor s'
gptkbp:successorAction a'
gptkbp:updated Q-values
gptkbp:updateRule Q(s,a) ← Q(s,a) + α [r + γ Q(s',a') − Q(s,a)]
gptkbp:usedFor gptkb:Markov_Decision_Processes
gptkbp:usedIn gptkb:artificial_intelligence
gptkb:machine_learning
gptkbp:bfsParent gptkb:Q-learning
gptkb:reinforcement_learning
gptkbp:bfsLayer 5