nnterp: Neural Network Interpretation Tools for Transformer Models

Published in Open Source Project, 2024

nnterp is a Python package designed to simplify the analysis and interpretation of transformer models. Built on top of nnsight, it provides tools for activation collection, intervention experiments, and visualization of model behaviors.

Key Features

The package offers several core functionalities:

Model Analysis

  • Easy model loading and initialization
  • Batch-wise activation collection
  • Support for both local and remote model execution
  • Flexible token-level analysis

Interventions

  • Logit lens implementation
  • Patchscope lens for cross-model analysis
  • Custom intervention support
  • Probability distribution analysis

Visualization

  • Token probability plotting
  • Interactive visualization tools
  • Customizable plotting options
  • HTML report generation

Implementation

The tool is structured around three main modules:

  • nnsight_utils.py for unified model handling
  • interventions.py for analysis techniques
  • prompt_utils.py for prompt management and token tracking

Installation is straightforward via pip:

pip install nnterp
pip install nnterp[display]  # for visualization support

For detailed usage examples and documentation, visit the GitHub repository.