State-Action-Reward-State-Action
GPTKB entity
Statements (18)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:algorithm
|
gptkbp:abbreviation |
gptkb:SARSA
|
gptkbp:author |
gptkb:Rummery_and_Niranjan
|
gptkbp:category |
temporal difference learning
|
gptkbp:distinctFrom |
Q-learning is off-policy, SARSA is on-policy
|
gptkbp:field |
gptkb:reinforcement_learning
|
https://www.w3.org/2000/01/rdf-schema#label |
State-Action-Reward-State-Action
|
gptkbp:introducedIn |
1996
|
gptkbp:relatedTo |
gptkb:Q-learning
|
gptkbp:stepSequence |
state, action, reward, next state, next action
|
gptkbp:type |
on-policy algorithm
|
gptkbp:updateRule |
Q(s,a) ← Q(s,a) + α [r + γ Q(s',a') - Q(s,a)]
|
gptkbp:usedFor |
learning policies in Markov Decision Processes
|
gptkbp:usedIn |
gptkb:game_AI
autonomous systems robotics |
gptkbp:bfsParent |
gptkb:SARSA
|
gptkbp:bfsLayer |
6
|