Sveriges mest populära poddar

LessWrong (30+ Karma)

“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland

6 min • 5 maj 2025

A corollary of Sutton's Bitter Lesson is that solutions to AI safety should scale with compute. Let me list a few examples of research directions that aim at this kind of solution:

  • Deliberative Alignment: Combine chain-of-thought with Constitutional AI, so that safety improves with inference-time compute (see Guan et al. 2025, Figure 13).
  • AI Control: Design control protocols that pit a red team against a blue team so that running the game for longer results in more reliable estimates of the probability of successful scheming during deployment (e.g., weight exfiltration).
  • Debate: Design a debate protocol so that running a longer, deeper debate between AI assistants makes us more confident that we're encouraging truthfulness or other desirable qualities (see Irving et al. 2018, Table 1).
  • Bengio's Scientist AI: Develop safety guardrails that obtain more reliable estimates of the probability of catastrophic risk with increasing inference time:[1]

[I]n the short [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
May 5th, 2025

Source:
https://www.lesswrong.com/posts/6hy7tsB2pkpRHqazG/the-sweet-lesson-ai-safety-should-scale-with-compute

---

Narrated by TYPE III AUDIO.

00:00 -00:00