Soft Actor-Critic (SAC)

URI: https://gptkb.org/entity/Soft_Actor-Critic_(SAC)

GPTKB entity

Statements (52)

Predicate	Object
gptkbp:instanceOf	gptkb:reinforcement_learning_algorithm
gptkbp:actionSpace	continuous
gptkbp:appliesTo	continuous action spaces
gptkbp:basedOn	actor-critic architecture
gptkbp:citation	over 5000
gptkbp:developedBy	gptkb:Pieter_Abbeel gptkb:Sergey_Levine Aurick Zhou Tuomas Haarnoja
gptkbp:expedition	encouraged by entropy maximization
gptkbp:hasVariant	Discrete SAC Multi-Agent SAC
gptkbp:heldBy	gptkb:deep_reinforcement_learning_algorithm off-policy model-free
gptkbp:hyperparameter	learning rate discount factor (gamma) target smoothing coefficient (tau) temperature parameter (alpha)
gptkbp:improves	gptkb:Proximal_Policy_Optimization_(PPO) gptkb:Trust_Region_Policy_Optimization_(TRPO) Deep Deterministic Policy Gradient (DDPG) Twin Delayed DDPG (TD3)
gptkbp:introducedIn	2018
gptkbp:openSource	gptkb:park gptkb:RLlib gptkb:Stable_Baselines3 gptkb:OpenAI_Spinning_Up
gptkbp:optimizedFor	expected reward policy entropy
gptkbp:policy	stochastic
gptkbp:publishedIn	arXiv:1801.01290
gptkbp:relatedTo	gptkb:Deep_Q-Network_(DQN) Policy Gradient Methods Maximum Entropy RL
gptkbp:rewardFunction	augmented with entropy term
gptkbp:robustness	high
gptkbp:sampleEfficiency	high
gptkbp:stable	high
gptkbp:title	gptkb:Soft_Actor-Critic:_Off-Policy_Maximum_Entropy_Deep_Reinforcement_Learning_with_a_Stochastic_Actor
gptkbp:usedIn	robotics autonomous driving simulated control tasks
gptkbp:uses	value function Q-function replay buffer target networks maximum entropy reinforcement learning stochastic policy
gptkbp:bfsParent	gptkb:Trust_Region_Policy_Optimization_(TRPO)
gptkbp:bfsLayer	7
https://www.w3.org/2000/01/rdf-schema#label	Soft Actor-Critic (SAC)