Mechanistic Interpretability

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:Field_of_AI_research
gptkbp:aimsTo	Enable model auditing Improve AI safety Increase transparency of AI systems
gptkbp:appliesTo	Deep learning models
gptkbp:emergedIn	2020s
gptkbp:focusesOn	Understanding neural network internals
gptkbp:goal	Explain model behavior in terms of mechanisms
gptkbp:hasMethod	Circuit analysis Activation patching Feature visualization Neuron analysis
gptkbp:notableContributor	gptkb:Tom_Lieberum gptkb:Chris_Olah gptkb:Neel_Nanda
gptkbp:notableOrganization	gptkb:DeepMind gptkb:OpenAI gptkb:Anthropic gptkb:Redwood_Research
gptkbp:notablePublication	gptkb:A_Mathematical_Framework_for_Transformer_Circuits gptkb:Transformer_Circuits Zoom In: An Introduction to Circuits
gptkbp:relatedConcept	gptkb:Transformer_models Transparency Activation patching Attention heads Feature attribution Induction heads Language models Model auditing Model internals Model interpretability Model transparency Monosemanticity Neural network interpretability Neuron splitting Polysemantic neurons Reverse engineering neural networks Superposition in neural networks Vision models
gptkbp:relatedTo	gptkb:Explainable_AI Interpretability
gptkbp:bfsParent	gptkb:Transformer_Circuits
gptkbp:bfsLayer	8
https://www.w3.org/2000/01/rdf-schema#label	Mechanistic Interpretability