Introduction to AI safety

Date: December 16, 2024

I was invited by my former Mathematics Professor, René Adad, to give a 50-minute introductory talk on AI safety at Lycée Thiers to an audience of classe préparatoire students and teachers. The outline of my presentation covered:

The possibility that AGI might be as close as just a few years away
How our current AI training methods can cause problems like misalignment, reward hacking, and instrumental convergence, which could lead to catastrophic outcomes
Early signs of these problems already emerging in chatbot LLMs (including the Apollo Research demonstration of deception)
A brief overview of current research agendas, including:
- Evaluation frameworks
- Governance approaches
- Mechanistic interpretability
- Scalable oversight methods (such as Debate and Iterated Amplification)

Share on

Twitter Facebook LinkedIn

Clément Dumas

Share on