State-Action-Reward-State-Action

GPTKB entity

Statements (18)
Predicate Object
gptkbp:instanceOf gptkb:algorithm
gptkbp:abbreviation gptkb:SARSA
gptkbp:author gptkb:Rummery_and_Niranjan
gptkbp:category temporal difference learning
gptkbp:distinctFrom Q-learning is off-policy, SARSA is on-policy
gptkbp:field gptkb:reinforcement_learning
https://www.w3.org/2000/01/rdf-schema#label State-Action-Reward-State-Action
gptkbp:introducedIn 1996
gptkbp:relatedTo gptkb:Q-learning
gptkbp:stepSequence state, action, reward, next state, next action
gptkbp:type on-policy algorithm
gptkbp:updateRule Q(s,a) ← Q(s,a) + α [r + γ Q(s',a') - Q(s,a)]
gptkbp:usedFor learning policies in Markov Decision Processes
gptkbp:usedIn gptkb:game_AI
autonomous systems
robotics
gptkbp:bfsParent gptkb:SARSA
gptkbp:bfsLayer 6