Statements (31)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:reinforcement_learning_algorithm
|
| gptkbp:abbreviation |
gptkb:SAC
|
| gptkbp:advantage |
gptkb:explorer
stability sample efficiency |
| gptkbp:appliesTo |
continuous action spaces
|
| gptkbp:basedOn |
maximum entropy reinforcement learning
|
| gptkbp:citation |
gptkb:Soft_Actor-Critic:_Off-Policy_Maximum_Entropy_Deep_Reinforcement_Learning_with_a_Stochastic_Actor
|
| gptkbp:introduced |
gptkb:Pieter_Abbeel
gptkb:Sergey_Levine Aurick Zhou Tuomas Haarnoja |
| gptkbp:introducedIn |
2018
|
| gptkbp:optimizedFor |
expected reward
policy entropy |
| gptkbp:publishedIn |
arXiv:1801.01290
|
| gptkbp:relatedTo |
gptkb:Deep_Deterministic_Policy_Gradient
gptkb:Twin_Delayed_DDPG gptkb:Proximal_Policy_Optimization |
| gptkbp:type |
off-policy
model-free |
| gptkbp:uses |
value function
Q-function replay buffer actor-critic architecture target networks stochastic policy |
| gptkbp:bfsParent |
gptkb:DDPG
gptkb:Denis_Yarats |
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Soft Actor-Critic
|