Sveriges mest populära poddar

LessWrong (30+ Karma)

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

9 min • 4 mars 2025

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research.

If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:

  1. Detection: If our AI system is scheming, we have good reasons to believe that we would be able to detect it. 
  2. Prevention: We have good reasons to believe that our AI system has a low scheming propensity or that we could stop scheming actions before they cause harm.

In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]

---

Outline:

(01:07) Techniques

(04:41) Reasons to prioritize detection over prevention

---

First published:
March 4th, 2025

Source:
https://www.lesswrong.com/posts/bAWPsgbmtLf8ptay6/for-scheming-we-should-first-focus-on-detection-and-then-on

---

Narrated by TYPE III AUDIO.

00:00 -00:00