Activation Patching

GPTKB entity

Statements (18)
Predicate Object
gptkbp:instanceOf machine learning technique
gptkbp:alternativeName activation patching experiment
activation replacement
gptkbp:appliesTo language models
transformer models
gptkbp:enables identification of model components responsible for specific outputs
gptkbp:firstPublished 2022
https://www.w3.org/2000/01/rdf-schema#label Activation Patching
gptkbp:introduced researchers at Anthropic
gptkbp:method replacing activations in a model with activations from a different input
gptkbp:notablePublication gptkb:A_Mathematical_Framework_for_Transformer_Circuits
gptkbp:purpose to identify which parts of a neural network are responsible for specific behaviors
gptkbp:relatedTo circuit analysis
feature attribution
gptkbp:usedFor causal tracing in neural networks
gptkbp:usedIn mechanistic interpretability
gptkbp:bfsParent gptkb:Transformer_Circuits
gptkbp:bfsLayer 8