Statements (67)
Predicate | Object |
---|---|
gptkbp:instance_of |
gptkb:Artificial_Intelligence
|
gptkbp:bfsLayer |
4
|
gptkbp:bfsParent |
gptkb:DQN
|
gptkbp:aims_to |
improve stability
|
gptkbp:applies_to |
gptkb:public_transportation_system
healthcare financial modeling energy management Atari games experience replay |
gptkbp:based_on |
Q-learning algorithm
|
gptkbp:can_be_used_with |
prioritized experience replay
|
gptkbp:competes_with |
gptkb:DQN
|
gptkbp:developed_by |
gptkb:Google_Deep_Mind
|
gptkbp:form |
off-policy learning
|
https://www.w3.org/2000/01/rdf-schema#label |
Double DQN
|
gptkbp:improves |
Q-learning
|
gptkbp:introduced |
gptkb:2015
|
gptkbp:is_a_framework_for |
decision making
adaptive learning |
gptkbp:is_a_solution_for |
control problems
|
gptkbp:is_a_tool_for |
automated decision making
|
gptkbp:is_cited_in |
academic papers
|
gptkbp:is_compared_to |
gptkb:A3_C
gptkb:DDPG |
gptkbp:is_designed_to |
maximize cumulative reward
|
gptkbp:is_evaluated_by |
benchmark tasks
|
gptkbp:is_implemented_in |
gptkb:Graphics_Processing_Unit
gptkb:Py_Torch |
gptkbp:is_influenced_by |
gptkb:DQN
Q-learning |
gptkbp:is_known_for |
sample efficiency
|
gptkbp:is_part_of |
gptkb:software_framework
deep learning research |
gptkbp:is_related_to |
policy gradient methods
|
gptkbp:is_used_in |
gptkb:robot
reinforcement learning game AI |
gptkbp:marketing_strategy |
policy improvement
balances exploration and exploitation |
gptkbp:reduces |
overestimation bias
|
gptkbp:related_concept |
gptkb:Artificial_Intelligence
|
gptkbp:related_model |
temporal difference learning
learns from experience adapts to changing environments generalizes well supports continuous action spaces |
gptkbp:requires |
large amounts of data
|
gptkbp:subject |
research studies
|
gptkbp:technique |
dynamic programming
function approximation enhances performance uses neural networks can be parallelized. can be scaled easily facilitates learning from past experiences improves convergence speed improves learning efficiency increases robustness optimizes learning process reduces variance in estimates updates action values value estimation |
gptkbp:type_of |
deep reinforcement learning
|
gptkbp:uses |
two separate networks
|
gptkbp:utilizes |
target network
|
gptkbp:variant |
gptkb:DQN
|