nnterp: Neural Network Interpretation Tools for Transformer Models

Published in Open Source Project, 2024

nnterp is a Python package designed to simplify the analysis and interpretation of transformer models. Built on top of nnsight, it provides tools for activation collection, intervention experiments, and visualization of model behaviors.

Key Features

The package offers several core functionalities:

Model Analysis

Easy model loading and initialization
Batch-wise activation collection
Support for both local and remote model execution
Flexible token-level analysis

Interventions

Logit lens implementation
Patchscope lens for cross-model analysis
Custom intervention support
Probability distribution analysis

Visualization

Token probability plotting
Interactive visualization tools
Customizable plotting options
HTML report generation

Implementation

The tool is structured around three main modules:

nnsight_utils.py for unified model handling
interventions.py for analysis techniques
prompt_utils.py for prompt management and token tracking

Installation is straightforward via pip:

pip install nnterp
pip install nnterp[display]  # for visualization support

For detailed usage examples and documentation, visit the GitHub repository.

Clément Dumas

Key Features

Model Analysis

Interventions

Visualization

Implementation