Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
Published in ICML 2024 mechanistic interpretability workshop, 2024
You can check the old version of the paper on OpenReview.
This work was conducted during my first year master’s internship at EPFL (École Polytechnique Fédérale de Lausanne) under the supervision of Chris Wendler and Robert West.