AI Safety Fundamentals: Governance
We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:
Source:
https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
Narrated for AI Safety Fundamentals by Perrin Walker
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.