Documentation#

nnterp.standardized_transformer module#

class StandardizedTransformer(model, trust_remote_code=False, check_renaming=True, allow_dispatch=True, enable_attention_probs=False, check_attn_probs_with_trace=True, rename_config=None, **kwargs)[source]#

Bases: LanguageModel

Renames the LanguageModel modules to match a standardized architecture.

The model structure is organized as follows:

StandardizedTransformer
├── embed_tokens
├── layers
│   ├── self_attn
│   └── mlp
├── ln_final
└── lm_head

In addition to renaming modules, this class provides built-in accessors to extract and set intermediate activations:

embed_tokens: Get embedding module
token_embeddings: Get/set token embeddings (equivalent to embed_tokens.output)
layers[i]: Get layer module at layer i
layers_input[i]: Get/set layer input at layer i
layers_output[i]: Get/set layer output at layer i
attentions[i]: Get attention module at layer i
attentions_input[i] / attentions_output[i]: Get/set attention input/output at layer i
mlps[i]: Get MLP module at layer i
mlps_input[i] / mlps_output[i]: Get/set MLP input/output at layer i

Parameters:

repo_id (str) – Hugging Face repository ID or path of the model to load.
trust_remote_code (bool, optional) – If True, remote code will be trusted when loading the model. Defaults to False.
check_renaming (bool, default True) – If True, the renaming of modules is validated. Defaults to True.
allow_dispatch (bool, default True) – If True, allows using trace() to dispatch the model when scan() fails during renaming checks. Defaults to True. You should set this to false if you plan to use the model remotely.
enable_attention_probs (bool, default False) – If True, enables attention probabilities tracing by setting attn_implementation=”eager”. Defaults to False.
check_attn_probs_with_trace (bool, default True) – If True, the model will be dispatched and a test will ensure that the attention probabilities returned sum to 1.
rename_config (RenameConfig, default None) – A RenameConfig object to use for renaming the model. If None, a default RenameConfig will be used.
model (str | Module)

num_layers: int#

num_heads: int#

hidden_size: int#

vocab_size: int#

detect_layer_output_type()[source]#

property add_prefix_false_tokenizer: PreTrainedTokenizerBase#

property attn_probs_available: bool#

property input_ids: Tensor | Object#

property input_size: Size#: Returns the shape of the input tensor (batch_size, sequence_length)

property attention_mask: Tensor | Object#: Returns the attention mask tensor.

property token_embeddings: Tensor | Object#: Returns the token embeddings. Equivalent to self.embed_tokens.output

property logits: Tensor | Object#: Returns the predicted logits.

property next_token_probs: Tensor | Object#: Returns the predicted probabilities for the next token. Assumes padding_side is “left”.

skip_layer(layer, skip_with=None)[source]#

Skip the computation of a layer.

Parameters:

layer (int) – The layer to skip
skip_with (Tensor | Object | None) – The input to skip the layer with. If None, the input of the layer is used.

skip_layers(start_layer, end_layer, skip_with=None, layer_returns_tuple=None)[source]#

Skip all layers between start_layer and end_layer (inclusive).

Parameters:

start_layer (int) – The layer to start skipping from
end_layer (int) – The layer to stop skipping at (inclusive)
skip_with (Tensor | Object | None) – The tensor to skip the layers with, will be passed as the output of the layers. If None, the input of start_layer is used.
layer_returns_tuple (bool | None) – Whether the layer output is a tuple. Doesn’t need to be provided if the model’s renaming has been validated or if you ran model.detect_layer_output_type() already.

steer(layers, steering_vector, factor=1, positions=None)[source]#

Steer the hidden states of a layer using a steering vector.

Parameters:

layers (int | list[int]) – The layer(s) to steer
steering_vector (Tensor) – The steering vector to apply
factor (float) – The factor to multiply the steering vector by
positions (int | list[int] | Tensor | None) – The position to steer. If None, all positions are steered.

project_on_vocab(hidden_state)[source]#

Parameters:: hidden_state (Tensor | Object)
Return type:: Tensor | Object

probs_to_dict(tokens, probs)[source]#

Convert a tensor of probabilities to a dictionary mapping tokens to their probabilities

Parameters:

tokens (Tensor)
probs (Tensor)

Return type:

dict[str, float]

get_topk_closest_tokens(hidden_state, k=5)[source]#

Get the top-k closest tokens to the hidden state h.

Parameters:

h – The hidden state to project on the vocabulary. Shape (batch_size, hidden_size) or (hidden_size,).
k – The number of top tokens to return.
returns_df – If True, returns a DataFrame instead of a dictionary. Note that you need to have pandas installed for this to work. Pandas is included in pip install nnterp[display].
hidden_state (Tensor)

Returns:

A dictionary mapping tokens to their probabilities if h is 1D, or a list of dictionaries if h is 2D.

Return type:

dict[str, float] | list[dict[str, float]]

nnterp.interventions module#

logit_lens(nn_model, prompts, remote=False, return_inv_logits=False)[source]#

Same as logit_lens but for Llama models directly instead of Transformer_lens models. Get the probabilities of the next token for the last token of each prompt at each layer using the logit lens.

Parameters:

nn_model (LanguageModel) – NNSight Language Model
prompts (list[str] | str) – List of prompts or a single prompt
remote – If True, the function will run on the nndif server. See nnsight.net/status to check which models are available.
return_inv_logits – If True, the function will return the logits applied to the negative of the hidden states.

Returns:

A tensor of shape (num_prompts, num_layers, vocab_size) containing the probabilities of the next token for each prompt at each layer. Tensor is on the CPU.

Return type:

Tensor | tuple[Tensor, Tensor]

class TargetPrompt(prompt: 'str', index_to_patch: 'int')[source]#

Bases: object

Parameters:

prompt (str)
index_to_patch (int)

prompt: str#

index_to_patch: int#

repeat_prompt(words=None, rel=' ', sep='\n', placeholder='?', index_to_patch=-1)[source]#

Prompt used in the patchscopes paper to predict the next token. PAIR-code/interpretability

Parameters:

words – The words to repeat. If None, the words will be “king”, “1135”, “hello”.
rel – The string between the repeated words
sep – The separator between the words
placeholder – The placeholder to use for the last word

Returns:

A TargetPrompt object containing the prompt to patch and the index of the token to patch.

Return type:

TargetPrompt

class TargetPromptBatch(prompts, index_to_patch)[source]#

Bases: object

A class to handle multiple target prompts with potentially different indices to patch

Parameters:

prompts (list[str])
index_to_patch (Tensor)

prompts: list[str]#

index_to_patch: Tensor#

classmethod from_target_prompts(prompts_)[source]#

Parameters:: prompts_ (list[TargetPrompt])

classmethod from_target_prompt(prompt, batch_size)[source]#

Parameters:

prompt (TargetPrompt)
batch_size (int)

classmethod from_prompts(prompts, index_to_patch)[source]#

Parameters:

prompts (str | list[str])
index_to_patch (int | list[int] | Tensor)

static auto(target_prompt, batch_size)[source]#

Parameters:

target_prompt (str | TargetPrompt | list[TargetPrompt] | TargetPromptBatch)
batch_size (int)

patchscope_lens(nn_model, source_prompts=None, target_patch_prompts=None, layers=None, latents=None, remote=False)[source]#

Replace the hidden state of the patch_prompt.index_to_patch token in the patch_prompt.prompt with the hidden state of the last token of each prompt at each layer. Returns the probabilities of the next token in patch_prompt for each prompt for each layer intervention. :param nn_model: The NNSight TL model :param source_prompts: List of prompts or a single prompt to get the hidden states of the last token :param target_patch_prompts: TargetPrompt(s) / TargetPromptBatch containing the prompt to patch and the index of the token to patch :param layers: Layer / list of layers to intervene on. If None, all layers are intervened on. :param latents: Tensor of shape (num_layers, num_sources, hidden_size) If None, the hidden states of the last token of each source prompt at each layer are collected. You cannot provide both source_prompts and latents. :param remote: If True, the function will run on the nndif server. See nnsight.net/status to check which models are available.

Returns:

A tensor of shape (num_prompts, num_layers, vocab_size) containing the probabilities of the next token for each prompt at each layer. Tensor is on the CPU.

Parameters:

nn_model (LanguageModel)
source_prompts (list[str] | str | None)
target_patch_prompts (TargetPromptBatch | list[TargetPrompt] | TargetPrompt | None)
layers (int | list[int] | None)
latents (Tensor | None)

patchscope_generate(nn_model, prompts, target_patch_prompt, max_length=50, layers=None, remote=False, max_batch_size=32)[source]#

Replace the hidden state of the patch_prompt.index_to_patch token in the patch_prompt.prompt with the hidden state of the last token of each prompt at each layer. Returns the probabilities of the next token in patch_prompt for each prompt for each layer intervention. :param nn_model: The NNSight LanguageModel with llama architecture :param prompts: List of prompts or a single prompt to get the hidden states of the last token :param target_patch_prompt: A TargetPrompt object containing the prompt to patch and the index of the token to patch :param layers: List of layers to intervene on. If None, all layers are intervened on. :param max_length: The maximum length of the generated sequence :param remote: If True, the function will run on the nndif server. See nnsight.net/status to check which models are available. :param max_batch_size: The maximum number of prompts to intervene on at once.

Returns:

A tensor of shape (num_prompts, num_layers, vocab_size) containing the probabilities of the next token for each prompt at each layer. Tensor is on the CPU.

Parameters:

nn_model (LanguageModel)
prompts (list[str] | str)
target_patch_prompt (TargetPrompt)
max_length (int)

patch_object_attn_lens(nn_model, source_prompts, target_prompts, attn_idx_patch, num_patches=5)[source]#

A complex lens that makes the model attend to the hidden states of the last token of the source prompts instead of the attn_idx_patch token of the target prompts at last token prediction. For each layer, this intervention is performed for num_patches layers. :param nn_model: The NNSight model :param source_prompts: The prompts to get the hidden states of the last token from :param target_prompts: The prompts to predict the next token for :param attn_idx_patch: The index of the token to patch in the target prompts :param num_patches: The number of layers to patch for each layer

Returns:

A tensor of shape (num_target_prompts, num_layers, vocab_size) containing the probabilities of the next token for each target prompt at each layer. Tensor is on the CPU.

Parameters:

nn_model (LanguageModel)
source_prompts (list[str] | str)
target_prompts (list[str] | str)
attn_idx_patch (int)
num_patches (int)

nnterp.prompt_utils module#

exception TokenizationError[source]#: Bases: Exception

get_first_tokens(words, llm_or_tokenizer, use_hacky_implementation=False)[source]#

Get the all the first tokens of a “word” and “ word” for all words.

Parameters:

words (str | list[str]) – A string or a list of strings to get the first token of.
llm_or_tokenizer (LanguageModel | StandardizedTransformer | PreTrainedTokenizerBase) – The tokenizer to use. If a LanguageModel or StandardizedTransformer is provided, the tokenizer will be extracted from it. It is recommended to use StandardizedTransformer. If you want to use your own tokenizer, it’s recommended to initialize it with add_prefix_space=False or to use the hacky implementation.
use_hacky_implementation – If True, use a hacky implementation to get the first token of a word by tokenizing “🍐word” and extracting the first token of word. While hacky, it is still guaranteed to work correctly or raise an error.

Returns:

A list of tokens.

Return type:

list[int]

class Prompt(prompt, target_tokens, target_strings=None)[source]#

Bases: object

Generic class to represent a prompt with target tokens to track during next token prediction.

Parameters:

prompt (str) – The prompt to use
target_tokens (dict[str, list[int]]) – A dictionary of target tokens for each target
target_strings (dict[str, str | list[str]] | None) – A dictionary of target strings for each target

prompt: str#

target_tokens: dict[str, list[int]]#

target_strings: dict[str, str | list[str]] | None = None#

classmethod from_strings(prompt, target_strings, tokenizer)[source]#

Parameters:

prompt (str)
target_strings (dict[str, str | list[str]] | list[str] | str)

has_no_collisions(ignore_targets=None)[source]#

Parameters:: ignore_targets (None | str | list[str])

get_target_probs(probs, layer=None)[source]#

run(nn_model, get_probs)[source]#

Run the prompt through the model and return the probabilities of the next token for both the target tokens.

Parameters:: get_probs (Callable)

next_token_probs_unsqueeze(nn_model, prompt, remote=False, **_kwargs)[source]#

Parameters:

nn_model (LanguageModel)
prompt (str | list[str])

Return type:

Tensor

run_prompts(nn_model, prompts, batch_size=32, get_probs_func=None, func_kwargs=None, remote=False, tqdm=<class 'tqdm.asyncio.tqdm_asyncio'>)[source]#

Run a list of prompts through the model and return the probabilities of the next token for the target tokens.

Parameters:

nn_model (LanguageModel) – The NNSight model
prompts (list[Prompt]) – A list of prompts. All prompts must have the same target keys
batch_size (int) – The batch size to use
get_probs – The function to get the probabilities of the next token, default to next token prediction
method_kwargs – The kwargs to pass to the get_probs function
tqdm – The tqdm function to use, default to tqdm.auto.tqdm. Use None to disable tqdm
get_probs_func (Callable | None)
func_kwargs (dict | None)
remote (bool)

Returns:

A dictionary of target names and the probabilities of the next token for the target tokens.

Return type:

dict[str, Tensor]

nnterp.display module#

plot_topk_tokens(next_token_probs, tokenizer, k=4, title=None, use_token_ids=False, file=None, save_html=True, height=300, width=400)[source]#

Plot the top k tokens for each layer using Plotly.

Parameters:

next_token_probs (th.Tensor) – Probability tensor of shape (batch_size, num_layers, vocab_size) or (num_layers, vocab_size) or (vocab_size,)
tokenizer – Tokenizer object
k (int) – Number of top tokens to plot
title (str) – Title of the plot
use_token_ids (bool) – If True, use token IDs instead of token strings
file (str, optional) – File path to save the plot
save_html (bool) – If True, save an HTML file along with the image
height (int)
width (int)

Returns:

Plotly figure object

Return type:

go.Figure

prompts_to_df(prompts, tokenizer=None)[source]#

Convert a list of prompts to a pandas DataFrame, visualizing the target tokens and strings.

Parameters:: prompts (list[Prompt])

nnterp.nnsight_utils module#

get_embed_tokens(model)[source]#

Get the token embedding layer of the model

Parameters:: model (LanguageModel)
Return type:: Module

get_layers(model)[source]#

Get the layers of the model

Parameters:: model (LanguageModel)
Return type:: list[Envoy]

get_num_layers(nn_model)[source]#

Get the number of layers in the model :param nn_model: The NNSight model

Returns:: The number of layers in the model
Parameters:: nn_model (LanguageModel)

get_layer(nn_model, layer)[source]#

Get the layer of the model :param nn_model: The NNSight model :param layer: The layer to get

Returns:

The Envoy for the layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

Envoy

get_layer_input(nn_model, layer)[source]#

Get the hidden state input of a layer :param nn_model: The NNSight model :param layer: The layer to get the input of

Returns:

The Proxy for the input of the layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

int | Object

get_layer_output(nn_model, layer)[source]#

Get the residual stream after the layer :param nn_model: The NNSight model :param layer: The layer to get the output of

Returns:

The Proxy for the output of the layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

Tensor | Object

get_attention(nn_model, layer)[source]#

Get the attention module of a layer :param nn_model: The NNSight model :param layer: The layer to get the attention module of

Returns:

The Envoy for the attention module of the layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

Envoy

get_attention_output(nn_model, layer)[source]#

Get the output of the attention block of a layer :param nn_model: The NNSight model :param layer: The layer to get the output of

Returns:

The Proxy for the output of the attention block of the layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

Tensor | Object

get_mlp(nn_model, layer)[source]#

Get the MLP module of a layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

Envoy

get_mlp_output(nn_model, layer)[source]#

Get the output of the MLP of a layer

Parameters:

nn_model (LanguageModel)
layer (int)

Return type:

Tensor | Object

get_logits(nn_model)[source]#

Get the logits of the model :param nn_model: The NNSight model

Returns:: The Proxy for the logits of the model
Parameters:: nn_model (LanguageModel)
Return type:: Tensor | Object

get_unembed_norm(nn_model)[source]#

Get the last layer norm of the model :param nn_model: The NNSight model

Returns:: The Envoy for the last layer norm of the model
Parameters:: nn_model (LanguageModel)
Return type:: Envoy

get_unembed(nn_model)[source]#

Get the unembed module of the model :param nn_model: The NNSight model

Returns:: The Envoy for the unembed module of the model
Parameters:: nn_model (LanguageModel)
Return type:: Envoy

project_on_vocab(nn_model, h)[source]#

Project the hidden states on the vocabulary, after applying the model’s last layer norm :param nn_model: The NNSight model :param h: The hidden states to project

Returns:

The Proxy for the hidden states projected on the vocabulary

Parameters:

nn_model (LanguageModel)
h (Tensor | Object)

Return type:

Tensor | Object

get_next_token_probs(nn_model)[source]#

Get the probabilities of the model :param nn_model: The NNSight model

Returns:: The Proxy for the probabilities of the model
Parameters:: nn_model (LanguageModel)
Return type:: Tensor | Object

set_layer_output(nn_model, layer, tensor)[source]#

Set the output of a layer to a certain tensor. :param nn_model: The NNSight model :param layer: The layer to set the output of :param tensor: The tensor to set the output of the layer to

Parameters:

nn_model (LanguageModel)
layer (int)
tensor (Tensor | Object)

class ModuleAccessor(model, rename_config=None, rename=None)[source]#

Bases: object

Module that allows to use the NNsight and nnterp renaming utilities on huggingface models, to get the pytorch nn.Module objects with the same standardized names as the StandardizedTransformer class:

ModuleAccessor
├── embed_tokens
├── layers
│   ├── self_attn
│   └── mlp
├── ln_final
└── lm_head

Parameters:

model (PreTrainedModel) – The huggingface model to access
rename_config (RenameConfig | None) – An optional nnterp RenameConfig if your model has custom module names
rename (dict[str, str] | None) – An optional dictionary to allow you to have your own custom renaming operations

get_embed_tokens()[source]#

Return type:: Module

get_layers()[source]#

Return type:: ModuleList

get_mlp(layer)[source]#

Parameters:: layer (int)
Return type:: Module

get_attention(layer)[source]#

Parameters:: layer (int)
Return type:: Module

get_unembed_norm()[source]#

Return type:: Module

get_unembed()[source]#

Return type:: Module

get_token_activations(nn_model, prompts=None, layers=None, get_activations=None, remote=False, idx=None, tracer=None)[source]#

Collect the hidden states of the last token of each prompt at each layer

Parameters:

nn_model (LanguageModel) – The NNSight model
prompts – The prompts to collect activations for. Can be None if you call this from an existing tracer.
layers – The layers to collect activations for, default to all layers
get_activations (Callable[[LanguageModel, int], Tensor | Object] | None) – The function to get the activations, default to layer output
remote – Whether to run the model on the remote device
idx (int | None) – The index of the token to collect activations for
tracer – A tracer object to use to collect activations. If None, a new tracer is created.

Returns:

The hidden states of the last token of each prompt at each layer, moved to cpu. If open_context is False, returns a list of Proxies. Dimensions are (num_layers, num_prompts, hidden_size)

collect_last_token_activations_session(nn_model, prompts, batch_size, layers=None, get_activations=None, remote=False, idx=None)[source]#

Collect the hidden states of the specified token of each prompt at each layer in batches using a nnsight session.

Parameters:

nn_model – The NNSight model
prompts – The prompts to collect activations for
batch_size – The batch size to use
layers – The layers to collect activations for, default to all layers
get_activations – The function to get the activations, default to layer output
remote – Whether to run the model on the remote device
idx – The index of the token to collect activations for. Default is -1 (last token).

Returns:

The hidden states of the specified token of each prompt at each layer, moved to cpu. Dimensions are (num_layers, num_prompts, hidden_size)

collect_token_activations_batched(nn_model, prompts, batch_size, layers=None, get_activations=None, remote=False, idx=None, tqdm=None, use_session=True)[source]#

Collect the hidden states of the last token of each prompt at each layer in batches

Parameters:

nn_model (LanguageModel) – The NNSight model
prompts – The prompts to collect activations for
batch_size – The batch size to use
layers – The layers to collect activations for, default to all layers
get_activations (Callable[[LanguageModel, int], Tensor | Object] | None) – The function to get the activations, default to layer output
remote – Whether to run the model on the remote device
idx – The index of the token to collect activations for. Default is -1 (last token).
tqdm – Whether to use tqdm to show progress, default to None (no progress bar)
use_session – Whether to use a nnsight session to collect activations. Not sure why you’d want turn that off but who knows

Returns:

The hidden states of the specified token of each prompt at each layer, moved to cpu. Dimensions are (num_layers, num_prompts, hidden_size)

compute_next_token_probs(nn_model, prompt, remote=False)[source]#

Get the probabilities of the next token for the prompt :param nn_model: The NNSight model :param prompt: The prompt to get the probabilities for :param remote: Whether to run the model on the remote device

Returns:

The probabilities of the next token for the prompt

Parameters:

nn_model (LanguageModel)
prompt (str | list[str])

Return type:

Tensor

Internal Modules#

nnterp.utils module#

class ArchitectureNotFound[source]#: Bases: object

is_notebook()[source]#: Detect the current Python environment

display_markdown(text)[source]#

Parameters:: text (str)

display_source(source)[source]#

Parameters:: source (str)

class DummyCache[source]#

Bases: object

to_legacy_cache()[source]#

dummy_inputs()[source]#

try_with_scan(model, function, error_to_throw, allow_dispatch, warn_if_scan_fails=True, errors_to_raise=None)[source]#

Attempt to execute a function using model.scan(), falling back to model.trace() if needed.

This function tries to execute the given function within a model.scan() context first, which avoids dispatching the model. If that fails and fallback is allowed, it will try using model.trace() instead, which does dispatch the model.

Parameters:

model – The model object that supports .scan() and .trace() methods
function – A callable to execute within the model context (takes no arguments)
error_to_throw (Exception) – Exception to raise if both scan and trace fail
allow_dispatch (bool) – Whether to allow fallback to .trace() if .scan() fails
warn_if_scan_fails (bool, optional) – Whether to log warnings when scan fails. Defaults to True.
errors_to_raise (tuple, optional) – Tuple of exception types that should be raised immediately if encountered during scan, without fallback to trace.

Returns:

True if scan succeeded, False if trace was used instead

Return type:

bool

unpack_tuple(tensor_or_tuple)[source]#

Parameters:: tensor_or_tuple (Tensor | Object)
Return type:: Tensor | Object

nnterp.rename_utils module#

exception RenamingError[source]#

Bases: Exception

Exception raised when the renaming of modules is not properly done.

class AttnProbFunction[source]#

Bases: ABC

abstract get_attention_prob_source(attention_module, return_module_source=False)[source]#

Get the attention probabilities source for a given attention module. If return_module_source is True, return the full module source from where the attention probabilities are computed.

Parameters:: return_module_source (bool)

class RenameConfig(attn_name=None, mlp_name=None, ln_final_name=None, lm_head_name=None, model_name=None, layers_name=None, attn_prob_source=None, ignore_mlp=None, ignore_attn=None, attn_head_config_key=None, hidden_size_config_key=None, vocab_size_config_key=None)[source]#

Bases: object

Configuration for renaming transformer model modules to standardized names.

This dataclass specifies how to map model-specific module names to standardized names used by nnterp. It allows customization for different transformer architectures.

Parameters:

attn_name (str or list of str, optional) – Name(s) of the attention module to rename to ‘self_attn’.
mlp_name (str or list of str, optional) – Name(s) of the MLP/feed-forward module to rename to ‘mlp’.
ln_final_name (str or list of str, optional) – Name(s) of the final layer normalization to rename to ‘ln_final’.
lm_head_name (str or list of str, optional) – Name(s) of the language model head to rename to ‘lm_head’.
model_name (str or list of str, optional) – Name(s) of the main model container to rename to ‘model’.
layers_name (str or list of str, optional) – Name(s) of the transformer layers container to rename to ‘layers’.
attn_prob_source (AttnProbFunction, optional) – Custom function for accessing attention probabilities. Should be an instance of AttnProbFunction that defines how to extract attention weights from the attention module.
ignore_mlp (bool, optional) – Whether to skip MLP module processing for this architecture. Some models (e.g., OPT) don’t have a unified MLP module.
ignore_attn (bool, optional) – Whether to skip attention module processing for this architecture. Rarely used, for architectures without standard attention.
attn_head_config_key (str, list of str, or int, optional) – Custom key name for the number of attention heads in model config, or the number of heads directly. Defaults to standard keys: [‘n_heads’, ‘num_attention_heads’, ‘n_head’].
hidden_size_config_key (str, list of str, or int, optional) – Custom key name for hidden size in model config, or the hidden size directly. Defaults to standard keys: [‘hidden_size’, ‘d_model’, ‘n_embd’].
vocab_size_config_key (str, list of str, or int, optional) – Custom key name for vocab size in model config, or the vocab size directly. Defaults to standard keys: [‘vocab_size’, ‘n_vocab’, ‘text_config.vocab_size’].

Example

Custom configuration for a non-standard architecture:

config = RenameConfig(
    attn_name="custom_attention",
    mlp_name=["feed_forward", "ffn"]
)

attn_name: str | list[str] | None = None#

mlp_name: str | list[str] | None = None#

ln_final_name: str | list[str] | None = None#

lm_head_name: str | list[str] | None = None#

model_name: str | list[str] | None = None#

layers_name: str | list[str] | None = None#

attn_prob_source: AttnProbFunction | None = None#

ignore_mlp: bool | None = None#

ignore_attn: bool | None = None#

attn_head_config_key: str | list[str] | int | None = None#

hidden_size_config_key: str | list[str] | int | None = None#

vocab_size_config_key: str | list[str] | int | None = None#

expand_path_with_model(paths)[source]#

Parameters:: paths (list[str])
Return type:: list[str]

default_attn_head_config_keys()[source]#

default_hidden_size_config_keys()[source]#

default_vocab_size_config_keys()[source]#

get_rename_dict(rename_config=None)[source]#

Parameters:: rename_config (RenameConfig | None)
Return type:: dict[str, str]

text_config(model)[source]#

get_num_attention_heads(model, raise_error=True, rename_config=None)[source]#

Parameters:

raise_error (bool)
rename_config (RenameConfig | None)

Return type:

int | None

get_hidden_size(model, raise_error=True, rename_config=None)[source]#

Parameters:

raise_error (bool)
rename_config (RenameConfig | None)

Return type:

int | None

get_vocab_size(model, raise_error=True, rename_config=None)[source]#

Parameters:

raise_error (bool)
rename_config (RenameConfig | None)

Return type:

int | None

class IOType(value)[source]#

Bases: Enum

Enum to specify input or output access

INPUT = 'input'#

OUTPUT = 'output'#

class LayerAccessor(model, attr_name, io_type)[source]#

Bases: object

I/O accessor that provides input/output access with setter

Parameters:

attr_name (str | None)
io_type (IOType | None)

get_module(layer)[source]#

Parameters:: layer (int)
Return type:: Envoy

property returns_tuple: bool | None#: Returns whether the layer output is a tuple. Returns None if the tuple status has not been detected yet.

bloom_attention_prob_source(attention_module, return_module_source=False)[source]#

Parameters:: return_module_source (bool)

default_attention_prob_source(attention_module, return_module_source=False)[source]#

Parameters:: return_module_source (bool)

gpt2_attention_prob_source(attention_module, return_module_source=False)[source]#

Parameters:: return_module_source (bool)

gptj_attention_prob_source(attention_module, return_module_source=False)[source]#

Parameters:: return_module_source (bool)

class AttentionProbabilitiesAccessor(model, rename_config=None, initialized_with_enable=False)[source]#

Bases: object

Parameters:

rename_config (RenameConfig | None)
initialized_with_enable (bool)

disable()[source]#

check_source(layer=0, allow_dispatch=True, use_trace=True)[source]#

Check that the attention probabilities source is correctly configured.

This method validates that: 1. The attention probabilities have the expected shape (batch_size, num_heads, seq_len, seq_len) 2. The probabilities sum to 1 along the last dimension 3. Modifying the probabilities affects the model’s output logits

Parameters:

layer (int, optional) – The layer index to check. Defaults to 0.
allow_dispatch (bool, optional) – If True, allows dispatching the model when scan fails.
use_trace (bool, optional) – If False, uses scan() to validate the attention probabilities, which means attention probabilities summing to 1 and causal effect of modifying them won’t be tested. Defaults to True.

Raises:

RenamingError – If the attention probabilities are not properly configured or if the number of attention heads is not available.

print_source(layer=0, allow_dispatch=True)[source]#

Parameters:

layer (int)
allow_dispatch (bool)

get_ignores(model, rename_config=None)[source]#

Parameters:: rename_config (RenameConfig | None)
Return type:: list[str]

check_io(std_model, model_name, ignores)[source]#

Parameters:

model_name (str)
ignores (list[Literal['mlp', 'attention']])

check_model_renaming(std_model, model_name, ignores, allow_dispatch)[source]#

Parameters:

model_name (str)
ignores (list[Literal['mlp', 'attention']])
allow_dispatch (bool)

Documentation#

nnterp.standardized_transformer module#

nnterp.interventions module#

nnterp.prompt_utils module#

nnterp.display module#

nnterp.nnsight_utils module#

Internal Modules#

nnterp.utils module#

nnterp.rename_utils module#

This Page