Interventions#
Analysis Methods#
Logit Lens#
See predictions at each layer:
from nnterp.interventions import logit_lens
prompts = ["The capital of France is", "The sun rises in the"]
probs = logit_lens(model, prompts)
# Shape: (batch, layers, vocab)
Patchscope#
Replace activations from one context into another:
from nnterp.interventions import patchscope_lens, TargetPrompt, repeat_prompt
source_prompts = ["Paris is beautiful", "London is foggy"]
target_prompt = TargetPrompt("city: Paris\\nfood: croissant\\n?", -1)
# Or use repeat prompt
target_prompt = repeat_prompt(words=["car", "cross", "azdrfa"])
patchscope_probs = patchscope_lens(
model, source_prompts=source_prompts, target_patch_prompts=target_prompt
)