Start / LessWrong (30+ Karma) / The sweet lesson ai safety should scale with compute by jesse hoogland

“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland

6 min • 5 maj 2025

A corollary of Sutton's Bitter Lesson is that solutions to AI safety should scale with compute. Let me list a few examples of research directions that aim at this kind of solution:

Deliberative Alignment: Combine chain-of-thought with Constitutional AI, so that safety improves with inference-time compute (see Guan et al. 2025, Figure 13).
AI Control: Design control protocols that pit a red team against a blue team so that running the game for longer results in more reliable estimates of the probability of successful scheming during deployment (e.g., weight exfiltration).
Debate: Design a debate protocol so that running a longer, deeper debate between AI assistants makes us more confident that we're encouraging truthfulness or other desirable qualities (see Irving et al. 2018, Table 1).
Bengio's Scientist AI: Develop safety guardrails that obtain more reliable estimates of the probability of catastrophic risk with increasing inference time:[1]

[I]n the short [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
May 5th, 2025

Source:
https://www.lesswrong.com/posts/6hy7tsB2pkpRHqazG/the-sweet-lesson-ai-safety-should-scale-with-compute

---

Narrated by TYPE III AUDIO.

Kategorier

Filosofi Poddar Samhälle och kultur Teknologi

Förekommer på

Teknik

00:00 -00:00