Agent Foundations

GPTKB entity

Statements (73)
Predicate Object
gptkbp:instanceOf research
gptkbp:focusesOn decision theory
robustness
AI alignment
corrigibility
embedded agency
value learning
gptkbp:goal ensure safe AI systems
formalize agent reasoning
understand agent behavior
https://www.w3.org/2000/01/rdf-schema#label Agent Foundations
gptkbp:includes gptkb:football
gptkb:AI_control_problem
gptkb:AI_safety_via_debate
gptkb:Goodhart's_law
game theory
self-reference
multi-agent systems
transparency
bounded rationality
cooperation
inverse reinforcement learning
interpretability
utility functions
logical uncertainty
instrumental convergence
AI boxing
goal specification
AI corrigibility
AI goal stability
AI incentives
AI motivation
AI takeoff dynamics
Vingean reflection
acausal trade
alignment tax
causal decision theory
corrigibility problems
counterfactual reasoning
decision theory paradoxes
embedded agency problems
evidential decision theory
functional decision theory
impact regularization
inner alignment
instrumental goals
logical induction
logical omniscience
mesa-optimization
naturalized induction
ontological crises
outer alignment
preference learning
recursive self-improvement
reflective oracles
reward hacking
reward modeling
reward specification
robust delegation
self-modification
shutdown problem
subagent problems
tripwire mechanisms
updateless decision theory
value extrapolation
wireheading
gptkbp:relatedTo gptkb:artificial_intelligence
gptkb:machine_learning
gptkbp:studiedBy gptkb:MIRI
academic researchers
AI safety community
gptkbp:bfsParent gptkb:MIRI
gptkbp:bfsLayer 5