Sveriges mest populära poddar

LessWrong (30+ Karma)

“Extended analogy between humans, corporations, and AIs.” by Daniel Kokotajlo

15 min • 13 februari 2025

There are three main ways to try to understand and reason about powerful future AGI agents:

  1. Using formal models designed to predict the behavior of powerful general agents, such as expected utility maximization and variants thereof (explored in game theory and decision theory).
  2. Comparing & contrasting powerful future AGI agents with their weak, not-so-general, not-so-agentic AIs that actually exist today.
  3. Comparing & contrasting powerful future AGI agents with currently-existing powerful general agents, such as humans and human organizations.

I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:

  • A prototypical human corporation that has a lofty humanitarian mission but also faces market pressures and incentives.
  • A prototypical human working there, who thinks of themselves as a good person and independent thinker with lofty altruistic goals, but also faces the usual peer pressures and incentives.
  • AGI agents being trained [...]

---

Outline:

(01:29) The Analogy

(01:52) What happens when training incentives conflict with goals/principles

(08:14) Appendix: Three important concepts/distinctions

(08:38) Goals vs. Principles

(09:39) Contextually activated goals/principles

(12:32) Stability and/or consistency of goals/principles

---

First published:
February 13th, 2025

Source:
https://www.lesswrong.com/posts/bsTzgG3cRrsgbGtCc/extended-analogy-between-humans-corporations-and-ais

---

Narrated by TYPE III AUDIO.

00:00 -00:00