AI Alignment

URI: https://gptkb.org/entity/AI_Alignment

GPTKB entity

Statements (96)

Predicate	Object
gptkbp:instanceOf	gptkb:academic
gptkbp:address	interpretability corrigibility instrumental convergence value alignment problem reward hacking goal specification problem
gptkbp:concerns	ethical behavior of AI robustness of AI systems safety of AI systems
gptkbp:conference	gptkb:AI_Alignment_Forum gptkb:AI_Safety_Camp
gptkbp:focusesOn	aligning AI systems with human values
gptkbp:hasConcept	gptkb:AI_safety_problem gptkb:AI_control_problem gptkb:AI_safety_via_debate gptkb:Goodhart's_law gptkb:ELK_(Eliciting_Latent_Knowledge) gptkb:cooperative_inverse_reinforcement_learning gptkb:iterated_distillation_and_amplification gptkb:off-switch_problem gptkb:tripwire_mechanism AI ethics AI alignment research AI safety research AI governance amplification inverse reinforcement learning interpretability AI risk corrigibility AI safety community AI transparency instrumental convergence scalable oversight AI boxing AI interpretability AI robustness friendly AI AI corrigibility impact regularization inner alignment mesa-optimization outer alignment reward hacking reward modeling value learning AI existential risk AI alignment problem AI alignment community AI alignment forum AI reliability AI safety forum AI value alignment deontic constraints oracle AI recursive reward modeling
gptkbp:importantFor	artificial general intelligence advanced AI systems
gptkbp:notableContributor	gptkb:Ben_Garfinkel gptkb:Nick_Bostrom gptkb:Paul_Christiano gptkb:Rohin_Shah gptkb:Jessica_Taylor gptkb:Jan_Leike gptkb:Stuart_Russell gptkb:Chris_Olah gptkb:Dario_Amodei gptkb:Andrew_Critch gptkb:Eliezer_Yudkowsky gptkb:Nate_Soares gptkb:Owain_Evans gptkb:Beth_Barnes gptkb:Evan_Hubinger gptkb:Richard_Ngo gptkb:Victoria_Krakovna
gptkbp:organization	gptkb:DeepMind gptkb:OpenAI gptkb:Future_of_Humanity_Institute gptkb:Machine_Intelligence_Research_Institute gptkb:Alignment_Research_Center gptkb:Anthropic gptkb:Centre_for_the_Study_of_Existential_Risk
gptkbp:publishedIn	gptkb:LessWrong gptkb:AI_Alignment_Newsletter gptkb:Alignment_Forum
gptkbp:relatedTo	gptkb:artificial_intelligence AI safety machine ethics
gptkbp:studiedBy	researchers in computer science researchers in cognitive science researchers in philosophy
gptkbp:bfsParent	gptkb:FLI gptkb:LessWrong
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	AI Alignment