Posts by Collection

portfolio

projects

nnterp: Neural Network Interpretation Tools for Transformer Models

Published: September 01, 2024

A Python package for analyzing and interpreting transformer model behaviors through activation analysis and interventions, based on nnsight

Tiny-Dashboard: A Lightweight Feature Activation Analysis Tool

Published: November 26, 2024

A minimal, hackable package for building feature activation dashboards in transformer models

publications

redirects

Nnterp

talks

How do Llamas process multilingual text? A latent exploration through activation patching

Published: July 27, 2024

A 2-minutes lightning talk I made to present our paper at the ICML 2024 mechanistic interpretability workshop.

Introduction to AI safety

Published: December 16, 2024

I was invited by my former Mathematics Professor, René Adad, to give a 50-minute introductory talk on AI safety at Lycée Thiers to an audience of classe préparatoire students and teachers.

Clément Dumas