Statements (68)
Predicate | Object |
---|---|
gptkbp:instance_of |
gptkb:Artificial_Intelligence
|
gptkbp:aims_to |
improve stability
|
gptkbp:applies_to |
experience replay
|
gptkbp:can_be_combined_with |
prioritized experience replay
|
gptkbp:competes_with |
gptkb:DQN
|
gptkbp:concept |
gptkb:Artificial_Intelligence
|
gptkbp:developed_by |
gptkb:Google_Deep_Mind
|
https://www.w3.org/2000/01/rdf-schema#label |
Double DQN
|
gptkbp:improves |
Q-learning
|
gptkbp:introduced_in |
gptkb:2015
|
gptkbp:is_a_form_of |
off-policy learning
|
gptkbp:is_a_framework_for |
decision making
adaptive learning |
gptkbp:is_a_solution_for |
control problems
|
gptkbp:is_a_subject_of |
research studies
|
gptkbp:is_a_tool_for |
automated decision making
|
gptkbp:is_applicable_to |
gptkb:transportation
healthcare financial modeling energy management |
gptkbp:is_applied_in |
Atari games
|
gptkbp:is_based_on |
Q-learning algorithm
|
gptkbp:is_cited_in |
academic papers
|
gptkbp:is_compared_to |
gptkb:A3_C
gptkb:DDPG |
gptkbp:is_designed_to |
maximize cumulative reward
|
gptkbp:is_evaluated_by |
benchmark tasks
|
gptkbp:is_implemented_in |
gptkb:Tensor_Flow
gptkb:Py_Torch |
gptkbp:is_influenced_by |
gptkb:DQN
Q-learning |
gptkbp:is_known_for |
sample efficiency
|
gptkbp:is_part_of |
gptkb:machine_learning
deep learning research |
gptkbp:is_related_to |
policy gradient methods
|
gptkbp:is_used_in |
gptkb:robotics
reinforcement learning game AI |
gptkbp:marketing_strategy |
policy improvement
balances exploration and exploitation |
gptkbp:model |
temporal difference learning
learns from experience adapts to changing environments generalizes well supports continuous action spaces |
gptkbp:reduces |
overestimation bias
|
gptkbp:requires |
large amounts of data
|
gptkbp:technique |
dynamic programming
function approximation enhances performance uses neural networks can be parallelized. can be scaled easily facilitates learning from past experiences improves convergence speed improves learning efficiency increases robustness optimizes learning process reduces variance in estimates updates action values value estimation |
gptkbp:type_of |
deep reinforcement learning
|
gptkbp:uses |
two separate networks
|
gptkbp:utilizes |
target network
|
gptkbp:variant |
gptkb:DQN
|
gptkbp:bfsParent |
gptkb:Dueling_DQN
gptkb:DQN |
gptkbp:bfsLayer |
6
|