There are three main ways to try to understand and reason about powerful future AGI agents:
- Using formal models designed to predict the behavior of powerful general agents, such as expected utility maximization and variants thereof (explored in game theory and decision theory).
- Comparing & contrasting powerful future AGI agents with their weak, not-so-general, not-so-agentic AIs that actually exist today.
- Comparing & contrasting powerful future AGI agents with currently-existing powerful general agents, such as humans and human organizations.
I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:
- A prototypical human corporation that has a lofty humanitarian mission but also faces market pressures and incentives.
- A prototypical human working there, who thinks of themselves as a good person and independent thinker with lofty altruistic goals, but also faces the usual peer pressures and incentives.
- AGI agents being trained [...]
---
Outline:
(01:29) The Analogy
(01:52) What happens when training incentives conflict with goals/principles
(08:14) Appendix: Three important concepts/distinctions
(08:38) Goals vs. Principles
(09:39) Contextually activated goals/principles
(12:32) Stability and/or consistency of goals/principles
---
First published:
February 13th, 2025
Source:
https://www.lesswrong.com/posts/bsTzgG3cRrsgbGtCc/extended-analogy-between-humans-corporations-and-ais
---
Narrated by TYPE III AUDIO.