Soft Actor-Critic (SAC)

GPTKB entity

Statements (52)
Predicate Object
gptkbp:instanceOf reinforcement learning algorithm
gptkbp:actionSpace continuous
gptkbp:appliesTo continuous action spaces
gptkbp:basedOn actor-critic architecture
gptkbp:citation over 5000
gptkbp:developedBy gptkb:Pieter_Abbeel
gptkb:Sergey_Levine
Aurick Zhou
Tuomas Haarnoja
gptkbp:expedition encouraged by entropy maximization
gptkbp:hasVariant Discrete SAC
Multi-Agent SAC
gptkbp:heldBy off-policy
model-free
deep reinforcement learning algorithm
https://www.w3.org/2000/01/rdf-schema#label Soft Actor-Critic (SAC)
gptkbp:hyperparameter learning rate
discount factor (gamma)
target smoothing coefficient (tau)
temperature parameter (alpha)
gptkbp:improves gptkb:Proximal_Policy_Optimization_(PPO)
gptkb:Trust_Region_Policy_Optimization_(TRPO)
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed DDPG (TD3)
gptkbp:introducedIn 2018
gptkbp:openSource gptkb:park
gptkb:RLlib
gptkb:Stable_Baselines3
gptkb:OpenAI_Spinning_Up
gptkbp:optimizedFor expected reward
policy entropy
gptkbp:policy stochastic
gptkbp:publishedIn arXiv:1801.01290
gptkbp:relatedTo gptkb:Deep_Q-Network_(DQN)
Policy Gradient Methods
Maximum Entropy RL
gptkbp:rewardFunction augmented with entropy term
gptkbp:robustness high
gptkbp:sampleEfficiency high
gptkbp:stable high
gptkbp:title gptkb:Soft_Actor-Critic:_Off-Policy_Maximum_Entropy_Deep_Reinforcement_Learning_with_a_Stochastic_Actor
gptkbp:usedIn robotics
autonomous driving
simulated control tasks
gptkbp:uses value function
Q-function
replay buffer
target networks
maximum entropy reinforcement learning
stochastic policy
gptkbp:bfsParent gptkb:Trust_Region_Policy_Optimization_(TRPO)
gptkbp:bfsLayer 7