Start / AI Safety Fundamentals: Governance / Ai control improving safety despite intentional subversion

AI Safety Fundamentals: Governance

AI Control: Improving Safety Despite Intentional Subversion

21 min • 4 januari 2025

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:

We summarize the paper;
We compare our methodology to the methodology of other safety papers.

Source:
https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion

Narrated for AI Safety Fundamentals by Perrin Walker

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Kategorier

Filosofi Poddar Samhälle och kultur Teknologi

Förekommer på

00:00 -00:00