About me

I’m MATS Winter 2025 (7.0) Scholar with Neel Nanda. With Julian Minder, we’re currently working on evaluating different model diffingModel diffing is the study of mechanistic changes introduced during fine-tuning - essentially, understanding what makes a fine-tuned model different from its base model internally. methods.

I’m completing my MSc in Vision & Learning (MVA) at École Normale Supérieure Paris-Saclay. My main research interest is technical AI alignment.

Previously, I was fortunate to complete a research internship at EPFL DLAB under the amazing supervision of Chris Wendler and Bob West. My main focus was interpretability, and I worked on a follow-up to the “Do Llamas work in English?” paper. Our work was selected as a spotlight presentation at the Mechanistic Interpretability Workshop at ICML 2025.

Before starting this internship, I was exploring the emergence of XOR features in Large Language Models and the RAX hypothesis proposed by Sam Marks, did SPAR with Walter Laurito, worked on a project with Jobst Heitzig on non-maximizing training objectives for RL agents, and attended the ML4G AI safety bootcamp.

You can find my other projects on my CV or GitHub.

I am also interested in evolutionary biology, its manifestation in artificial life simulations, and I am an improvisor at the ENS improv theater club Lika. Feel free to message me if you want to get in touch!

News

Last updated: 26/06/2025

Clément Dumas

News