Statements (73)
Predicate | Object |
---|---|
gptkbp:instanceOf |
problem
|
gptkbp:addressedTo |
gptkb:cooperative_inverse_reinforcement_learning
robustness AI governance inverse reinforcement learning corrigibility alignment research scalable oversight constitutional AI impact regularization reward modeling value learning interpretability research AI safety engineering debate and amplification safe exploration |
gptkbp:concerns |
alignment of AI goals with human values
|
gptkbp:field |
gptkb:artificial_intelligence
|
gptkbp:firstDiscussed |
20th century
|
gptkbp:gainedAttention |
2010s
|
https://www.w3.org/2000/01/rdf-schema#label |
AI Alignment Problem
|
gptkbp:motive |
gptkb:AI_control_problem
gptkb:Goodhart's_law gptkb:orthogonality_thesis AI ethics AI governance AI alignment research community instrumental convergence AI catastrophic risk inner alignment instrumental goals mesa-optimization outer alignment reward hacking AI existential risk AI safety problem specification gaming distributional shift AI alignment benchmarks AI alignment challenges AI alignment conferences AI alignment funding AI alignment organizations AI alignment publications AI alignment tax AI corrigibility problem AI interpretability problem AI misuse risk AI reward specification problem AI robustness problem AI transparency problem AI value loading problem complexity of real-world environments deceptive alignment difficulty of specifying human values goal misgeneralization misalignment between training and deployment multi-agent alignment potential risks of superintelligent AI scalability of oversight unintended consequences of AI systems |
gptkbp:notableFigure |
gptkb:Nick_Bostrom
gptkb:Paul_Christiano gptkb:Stuart_Russell gptkb:Eliezer_Yudkowsky |
gptkbp:relatedTo |
AI safety
existential risk from AI value alignment |
gptkbp:studiedBy |
philosophers
AI researchers machine learning engineers |
gptkbp:bfsParent |
gptkb:Danger_(AI)
|
gptkbp:bfsLayer |
7
|