gptkbp:instanceOf
|
machine learning paradigm
|
gptkbp:application
|
autonomous vehicles
robotics
resource management
game playing
recommendation systems
|
gptkbp:challenge
|
credit assignment problem
exploration-exploitation tradeoff
sample efficiency
stability and convergence
|
gptkbp:conference
|
gptkb:AAAI
gptkb:ICML
gptkb:NeurIPS
gptkb:IJCAI
|
gptkbp:field
|
gptkb:artificial_intelligence
|
gptkbp:firstMajorBookPublished
|
1998
|
gptkbp:focusesOn
|
learning by trial and error
|
gptkbp:goal
|
maximize cumulative reward
|
gptkbp:hasSubfield
|
deep reinforcement learning
hierarchical reinforcement learning
inverse reinforcement learning
model-based reinforcement learning
model-free reinforcement learning
multi-agent reinforcement learning
|
https://www.w3.org/2000/01/rdf-schema#label
|
reinforcement learning
|
gptkbp:involves
|
gptkb:public_policy
environment
states
actions
reward function
value function
agent
|
gptkbp:notableBook
|
gptkb:Reinforcement_Learning:_An_Introduction
|
gptkbp:notableContributor
|
gptkb:Andrew_Barto
gptkb:Richard_S._Sutton
|
gptkbp:notableFor
|
gptkb:Monte_Carlo_methods
gptkb:Actor-Critic
gptkb:SARSA
gptkb:Deep_Q-Network_(DQN)
|
gptkbp:relatedTo
|
gptkb:Q-learning
Markov chain
dynamic programming
policy gradient methods
temporal difference learning
|
gptkbp:usedBy
|
gptkb:DeepMind
gptkb:AlphaGo
gptkb:OpenAI_Five
|
gptkbp:uses
|
rewards and punishments
|
gptkbp:bfsParent
|
gptkb:artificial_intelligence
gptkb:machine_learning
gptkb:Theory_of_Machine_Learning
|
gptkbp:bfsLayer
|
4
|