AI Alignment Problem

GPTKB entity

Statements (73)
Predicate Object
gptkbp:instanceOf problem
gptkbp:addressedTo gptkb:cooperative_inverse_reinforcement_learning
robustness
AI governance
inverse reinforcement learning
corrigibility
alignment research
scalable oversight
constitutional AI
impact regularization
reward modeling
value learning
interpretability research
AI safety engineering
debate and amplification
safe exploration
gptkbp:concerns alignment of AI goals with human values
gptkbp:field gptkb:artificial_intelligence
gptkbp:firstDiscussed 20th century
gptkbp:gainedAttention 2010s
https://www.w3.org/2000/01/rdf-schema#label AI Alignment Problem
gptkbp:motive gptkb:AI_control_problem
gptkb:Goodhart's_law
gptkb:orthogonality_thesis
AI ethics
AI governance
AI alignment research community
instrumental convergence
AI catastrophic risk
inner alignment
instrumental goals
mesa-optimization
outer alignment
reward hacking
AI existential risk
AI safety problem
specification gaming
distributional shift
AI alignment benchmarks
AI alignment challenges
AI alignment conferences
AI alignment funding
AI alignment organizations
AI alignment publications
AI alignment tax
AI corrigibility problem
AI interpretability problem
AI misuse risk
AI reward specification problem
AI robustness problem
AI transparency problem
AI value loading problem
complexity of real-world environments
deceptive alignment
difficulty of specifying human values
goal misgeneralization
misalignment between training and deployment
multi-agent alignment
potential risks of superintelligent AI
scalability of oversight
unintended consequences of AI systems
gptkbp:notableFigure gptkb:Nick_Bostrom
gptkb:Paul_Christiano
gptkb:Stuart_Russell
gptkb:Eliezer_Yudkowsky
gptkbp:relatedTo AI safety
existential risk from AI
value alignment
gptkbp:studiedBy philosophers
AI researchers
machine learning engineers
gptkbp:bfsParent gptkb:Danger_(AI)
gptkbp:bfsLayer 7