AI Alignment

GPTKB entity

Statements (95)
Predicate Object
gptkbp:instanceOf gptkb:academic
gptkbp:address interpretability
corrigibility
instrumental convergence
value alignment problem
reward hacking
goal specification problem
gptkbp:concerns ethical behavior of AI
robustness of AI systems
safety of AI systems
gptkbp:conference gptkb:AI_Alignment_Forum
gptkb:AI_Safety_Camp
gptkbp:focusesOn aligning AI systems with human values
gptkbp:hasConcept gptkb:AI_control_problem
gptkb:AI_safety_via_debate
gptkb:Goodhart's_law
gptkb:ELK_(Eliciting_Latent_Knowledge)
gptkb:cooperative_inverse_reinforcement_learning
gptkb:iterated_distillation_and_amplification
gptkb:off-switch_problem
gptkb:tripwire_mechanism
AI ethics
AI alignment research
AI safety research
AI governance
amplification
inverse reinforcement learning
interpretability
AI risk
corrigibility
AI safety community
AI transparency
instrumental convergence
scalable oversight
AI boxing
AI interpretability
AI robustness
friendly AI
AI corrigibility
impact regularization
inner alignment
mesa-optimization
outer alignment
reward hacking
reward modeling
value learning
AI existential risk
AI alignment problem
AI alignment community
AI alignment forum
AI reliability
AI safety forum
AI safety problem
AI value alignment
deontic constraints
oracle AI
recursive reward modeling
https://www.w3.org/2000/01/rdf-schema#label AI Alignment
gptkbp:importantFor artificial general intelligence
advanced AI systems
gptkbp:notableContributor gptkb:Ben_Garfinkel
gptkb:Nick_Bostrom
gptkb:Paul_Christiano
gptkb:Rohin_Shah
gptkb:Jessica_Taylor
gptkb:Jan_Leike
gptkb:Stuart_Russell
gptkb:Chris_Olah
gptkb:Dario_Amodei
gptkb:Andrew_Critch
gptkb:Eliezer_Yudkowsky
gptkb:Nate_Soares
gptkb:Owain_Evans
gptkb:Beth_Barnes
gptkb:Evan_Hubinger
gptkb:Richard_Ngo
gptkb:Victoria_Krakovna
gptkbp:organization gptkb:DeepMind
gptkb:OpenAI
gptkb:Future_of_Humanity_Institute
gptkb:Machine_Intelligence_Research_Institute
gptkb:Alignment_Research_Center
gptkb:Anthropic
gptkb:Centre_for_the_Study_of_Existential_Risk
gptkbp:publishedIn gptkb:LessWrong
gptkb:AI_Alignment_Newsletter
gptkb:Alignment_Forum
gptkbp:relatedTo gptkb:artificial_intelligence
AI safety
machine ethics
gptkbp:studiedBy researchers in computer science
researchers in cognitive science
researchers in philosophy
gptkbp:bfsParent gptkb:LessWrong
gptkbp:bfsLayer 6