Introduction to AI safety
Date:
I was invited by my former Mathematics Professor, René Adad, to give a 50-minute introductory talk on AI safety at Lycée Thiers to an audience of classe préparatoire students and teachers. The outline of my presentation covered:
- The possibility that AGI might be as close as just a few years away
- How our current AI training methods can cause problems like misalignment, reward hacking, and instrumental convergence, which could lead to catastrophic outcomes
- Early signs of these problems already emerging in chatbot LLMs (including the Apollo Research demonstration of deception)
- A brief overview of current research agendas, including:
- Evaluation frameworks
- Governance approaches
- Mechanistic interpretability
- Scalable oversight methods (such as Debate and Iterated Amplification)