nnterp: Neural Network Interpretation Tools for Transformer Models
Published in Open Source Project, 2024
nnterp is a Python package designed to simplify the analysis and interpretation of transformer models. Built on top of nnsight, it provides tools for activation collection, intervention experiments, and visualization of model behaviors.
Key Features
The package offers several core functionalities:
Model Analysis
- Easy model loading and initialization
- Batch-wise activation collection
- Support for both local and remote model execution
- Flexible token-level analysis
Interventions
- Logit lens implementation
- Patchscope lens for cross-model analysis
- Custom intervention support
- Probability distribution analysis
Visualization
- Token probability plotting
- Interactive visualization tools
- Customizable plotting options
- HTML report generation
Implementation
The tool is structured around three main modules:
nnsight_utils.py
for unified model handlinginterventions.py
for analysis techniquesprompt_utils.py
for prompt management and token tracking
Installation is straightforward via pip:
pip install nnterp
pip install nnterp[display] # for visualization support
For detailed usage examples and documentation, visit the GitHub repository.