State-Action-Reward-State-Action

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:algorithm
gptkbp:abbreviation	gptkb:SARSA
gptkbp:author	gptkb:Rummery_and_Niranjan
gptkbp:category	temporal difference learning
gptkbp:distinctFrom	Q-learning is off-policy, SARSA is on-policy
gptkbp:field	gptkb:reinforcement_learning
gptkbp:introducedIn	1996
gptkbp:relatedTo	gptkb:Q-learning
gptkbp:stepSequence	state, action, reward, next state, next action
gptkbp:type	on-policy algorithm
gptkbp:updateRule	Q(s,a) ← Q(s,a) + α [r + γ Q(s',a') - Q(s,a)]
gptkbp:usedFor	learning policies in Markov Decision Processes
gptkbp:usedIn	gptkb:game_AI autonomous systems robotics
gptkbp:bfsParent	gptkb:SARSA
gptkbp:bfsLayer	6
https://www.w3.org/2000/01/rdf-schema#label	State-Action-Reward-State-Action