Start / LessWrong (30+ Karma) / How might we safely pass the buck to ai by joshc

“How might we safely pass the buck to AI?” by joshc

68 min • 19 februari 2025

My goal as an AI safety researcher is to put myself out of a job.

I don’t worry too much about how planet sized brains will shape galaxies in 100 years. That's something for AI systems to figure out.

Instead, I worry about safely replacing human researchers with AI agents, at which point human researchers are “obsolete.” The situation is not necessarily fine after human obsolescence; however, the bulk of risks that are addressable by human technical researchers (like me) will have been addressed.

This post explains how developers might safely “pass the buck” to AI.

I first clarify what I mean by “pass the buck” (section 1) and explain why I think AI safety researchers should make safely passing the buck their primary end goal – rather than focus on the loftier ambition of aligning superintelligence (section 2).

Figure 1. A summary of why I think human AI [...]

---

Outline:

(17:27) 1. Briefly responding to objections

(20:06) 2. What I mean by passing the buck to AI

(21:53) 3. Why focus on passing the buck rather than aligning superintelligence.

(26:28) 4. Three strategies for passing the buck to AI

(29:16) 5. Conditions that imply that passing the buck improves safety

(32:01) 6. The capability condition

(35:45) 7. The trust condition

(36:36) 8. Argument #1: M_1 agents are approximately aligned and will maintain their alignment until they have completed their deferred task

(45:49) 9. Argument #2: M_1 agents cannot subvert autonomous control measures while they complete the deferred task