This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research.
If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:
In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]
---
Outline:
(01:07) Techniques
(04:41) Reasons to prioritize detection over prevention
---
First published:
March 4th, 2025
Narrated by TYPE III AUDIO.