Start / LessWrong (30+ Karma) / Extended analogy between humans corporations and ais by daniel kokotajlo

“Extended analogy between humans, corporations, and AIs.” by Daniel Kokotajlo

15 min • 13 februari 2025

There are three main ways to try to understand and reason about powerful future AGI agents:

Using formal models designed to predict the behavior of powerful general agents, such as expected utility maximization and variants thereof (explored in game theory and decision theory).
Comparing & contrasting powerful future AGI agents with their weak, not-so-general, not-so-agentic AIs that actually exist today.
Comparing & contrasting powerful future AGI agents with currently-existing powerful general agents, such as humans and human organizations.

I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:

A prototypical human corporation that has a lofty humanitarian mission but also faces market pressures and incentives.
A prototypical human working there, who thinks of themselves as a good person and independent thinker with lofty altruistic goals, but also faces the usual peer pressures and incentives.
AGI agents being trained [...]

---

Outline:

(01:29) The Analogy

(01:52) What happens when training incentives conflict with goals/principles

(08:14) Appendix: Three important concepts/distinctions

(08:38) Goals vs. Principles

(09:39) Contextually activated goals/principles

(12:32) Stability and/or consistency of goals/principles

---

First published:
February 13th, 2025

Source:
https://www.lesswrong.com/posts/bsTzgG3cRrsgbGtCc/extended-analogy-between-humans-corporations-and-ais

---

Kategorier

Förekommer på

00:00 -00:00