SARSA

URI: https://gptkb.org/entity/SARSA

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:reinforcement_learning_algorithm
gptkbp:actionVariable	a
gptkbp:appliesTo	gptkb:game_AI autonomous systems robotics
gptkbp:canBe	tabular function approximation
gptkbp:category	on-policy algorithm
gptkbp:citation	Rummery, G.A. & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.
gptkbp:convergesTo	optimal policy (under certain conditions)
gptkbp:differential	SARSA is on-policy, Q-learning is off-policy
gptkbp:distinctFrom	gptkb:Q-learning
gptkbp:explores	environment
gptkbp:firstDescribed	gptkb:Rummery_and_Niranjan 1996
gptkbp:fullName	gptkb:State-Action-Reward-State-Action
gptkbp:learns	action-value function
gptkbp:parameter	gptkb:public_policy discount factor learning rate exploration rate
gptkbp:policy	ε-greedy
gptkbp:relatedTo	gptkb:Q-learning
gptkbp:rewardSignal	r
gptkbp:stateVariable	s
gptkbp:successor	s'
gptkbp:successorAction	a'
gptkbp:updated	Q-values
gptkbp:updateRule	Q(s,a) ← Q(s,a) + α [r + γ Q(s',a') − Q(s,a)]
gptkbp:usedFor	gptkb:Markov_Decision_Processes
gptkbp:usedIn	gptkb:artificial_intelligence gptkb:machine_learning
gptkbp:bfsParent	gptkb:Q-learning gptkb:reinforcement_learning
gptkbp:bfsLayer	5
http://www.w3.org/2000/01/rdf-schema#label	SARSA