gptkbp:instanceOf
|
machine learning paradigm
|
gptkbp:conference
|
gptkb:AAAI
gptkb:ICLR
gptkb:ICML
gptkb:NeurIPS
|
gptkbp:fieldOfStudy
|
gptkb:artificial_intelligence
|
gptkbp:firstPublished
|
1980s
|
gptkbp:focusesOn
|
sequential decision making
learning from interaction
|
gptkbp:hasApplication
|
autonomous vehicles
robotics
resource management
game playing
recommendation systems
|
gptkbp:hasConcept
|
gptkb:action
gptkb:award
gptkb:explorer
gptkb:public_policy
gptkb:state_order
environment
value function
exploitation
agent
|
gptkbp:hasJournal
|
gptkb:Journal_of_Machine_Learning_Research
gptkb:Machine_Learning_Journal
gptkb:Artificial_Intelligence_Journal
|
gptkbp:hasSubfield
|
deep reinforcement learning
hierarchical reinforcement learning
inverse reinforcement learning
model-based reinforcement learning
model-free reinforcement learning
multi-agent reinforcement learning
|
https://www.w3.org/2000/01/rdf-schema#label
|
Reinforcement Learning
|
gptkbp:notableBook
|
gptkb:Reinforcement_Learning:_An_Introduction
|
gptkbp:notableContributor
|
gptkb:Richard_S._Sutton
gptkb:Andrew_G._Barto
|
gptkbp:notableFor
|
gptkb:Monte_Carlo_methods
gptkb:Actor-Critic
gptkb:Deep_Q-Network
gptkb:SARSA
gptkb:Q-learning
gptkb:Policy_Gradient
gptkb:Temporal_Difference_learning
|
gptkbp:relatedTo
|
supervised learning
unsupervised learning
|
gptkbp:usedBy
|
gptkb:DeepMind
gptkb:AlphaGo
gptkb:OpenAI_Five
|
gptkbp:uses
|
Markov chain
reward signals
trial and error
|
gptkbp:bfsParent
|
gptkb:NeurIPS_2013
gptkb:NeurIPS_2018
gptkb:Unity_Machine_Learning_Agents
gptkb:Neural_Networks
gptkb:Deep_Neural_Network
|
gptkbp:bfsLayer
|
6
|