nnterp: Neural Network Interpretation Tools for Transformer Models
Published:
A Python package for analyzing and interpreting transformer model behaviors through activation analysis and interventions, based on nnsight
Published:
A Python package for analyzing and interpreting transformer model behaviors through activation analysis and interventions, based on nnsight
Published:
A minimal, hackable package for building feature activation dashboards in transformer models
Published:
A 2-minutes lightning talk I made to present our paper at the ICML 2024 mechanistic interpretability workshop.