Introduction to AI safety

Date:

I was invited by my former Mathematics Professor, René Adad, to give a 50-minute introductory talk on AI safety at Lycée Thiers to an audience of classe préparatoire students and teachers. The outline of my presentation covered:

  • The possibility that AGI might be as close as just a few years away
  • How our current AI training methods can cause problems like misalignment, reward hacking, and instrumental convergence, which could lead to catastrophic outcomes
  • Early signs of these problems already emerging in chatbot LLMs (including the Apollo Research demonstration of deception)
  • A brief overview of current research agendas, including:
    • Evaluation frameworks
    • Governance approaches
    • Mechanistic interpretability
    • Scalable oversight methods (such as Debate and Iterated Amplification)