Activation Patching

URI: https://gptkb.org/entity/Activation_Patching

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:machine_learning_technique
gptkbp:alternativeName	activation patching experiment activation replacement
gptkbp:appliesTo	language models transformer models
gptkbp:enables	identification of model components responsible for specific outputs
gptkbp:firstPublished	2022
gptkbp:introduced	researchers at Anthropic
gptkbp:method	replacing activations in a model with activations from a different input
gptkbp:notablePublication	gptkb:A_Mathematical_Framework_for_Transformer_Circuits
gptkbp:purpose	to identify which parts of a neural network are responsible for specific behaviors
gptkbp:relatedTo	circuit analysis feature attribution
gptkbp:usedFor	causal tracing in neural networks
gptkbp:usedIn	mechanistic interpretability
gptkbp:bfsParent	gptkb:Transformer_Circuits
gptkbp:bfsLayer	8
http://www.w3.org/2000/01/rdf-schema#label	Activation Patching