My goal as an AI safety researcher is to put myself out of a job.
I don’t worry too much about how planet sized brains will shape galaxies in 100 years. That's something for AI systems to figure out.
Instead, I worry about safely replacing human researchers with AI agents, at which point human researchers are “obsolete.” The situation is not necessarily fine after human obsolescence; however, the bulk of risks that are addressable by human technical researchers (like me) will have been addressed.
This post explains how developers might safely “pass the buck” to AI.
I first clarify what I mean by “pass the buck” (section 1) and explain why I think AI safety researchers should make safely passing the buck their primary end goal – rather than focus on the loftier ambition of aligning superintelligence (section 2).
Figure 1. A summary of why I think human AI [...]
---
Outline:
(17:27) 1. Briefly responding to objections
(20:06) 2. What I mean by passing the buck to AI
(21:53) 3. Why focus on passing the buck rather than aligning superintelligence.
(26:28) 4. Three strategies for passing the buck to AI
(29:16) 5. Conditions that imply that passing the buck improves safety
(32:01) 6. The capability condition
(35:45) 7. The trust condition
(36:36) 8. Argument #1: M_1 agents are approximately aligned and will maintain their alignment until they have completed their deferred task
(45:49) 9. Argument #2: M_1 agents cannot subvert autonomous control measures while they complete the deferred task
(47:06) Analogies to dictatorships suggest that autonomous control might be viable
(48:59) Listing potential autonomous control measures
(52:39) How to evaluate autonomous control
(54:31) 10. Argument #3: Returns to additional human-supervised research are small
(56:47) Control measures
(01:00:52) 11. Argument #4: AI agents are incentivized to behave as safely as humans
(01:07:01) 12. Conclusion
---
First published:
February 19th, 2025
Source:
https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.