AI Alignment Problem

URI: https://gptkb.org/entity/AI_Alignment_Problem

GPTKB entity

Statements (73)

Predicate	Object
gptkbp:instanceOf	gptkb:problem
gptkbp:addressedTo	gptkb:cooperative_inverse_reinforcement_learning robustness AI governance inverse reinforcement learning corrigibility alignment research scalable oversight constitutional AI impact regularization reward modeling value learning interpretability research AI safety engineering debate and amplification safe exploration
gptkbp:concerns	alignment of AI goals with human values
gptkbp:field	gptkb:artificial_intelligence
gptkbp:firstDiscussed	20th century
gptkbp:gainedAttention	2010s
gptkbp:motive	gptkb:AI_safety_problem gptkb:AI_control_problem gptkb:Goodhart's_law gptkb:orthogonality_thesis AI ethics AI governance AI alignment research community instrumental convergence AI catastrophic risk inner alignment instrumental goals mesa-optimization outer alignment reward hacking AI existential risk specification gaming distributional shift AI alignment benchmarks AI alignment challenges AI alignment conferences AI alignment funding AI alignment organizations AI alignment publications AI alignment tax AI corrigibility problem AI interpretability problem AI misuse risk AI reward specification problem AI robustness problem AI transparency problem AI value loading problem complexity of real-world environments deceptive alignment difficulty of specifying human values goal misgeneralization misalignment between training and deployment multi-agent alignment potential risks of superintelligent AI scalability of oversight unintended consequences of AI systems
gptkbp:notableFigure	gptkb:Nick_Bostrom gptkb:Paul_Christiano gptkb:Stuart_Russell gptkb:Eliezer_Yudkowsky
gptkbp:relatedTo	AI safety existential risk from AI value alignment
gptkbp:studiedBy	philosophers AI researchers machine learning engineers
gptkbp:bfsParent	gptkb:Danger_(AI)
gptkbp:bfsLayer	8
http://www.w3.org/2000/01/rdf-schema#label	AI Alignment Problem