Start / AI Safety Fundamentals: Governance / Learning from human preferences

Learning From Human Preferences

7 min • 4 januari 2025

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety team, we’ve developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.

Original article:
https://openai.com/research/learning-from-human-preferences

Authors:
Dario Amodei, Paul Christiano, Alex Ray

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Kategorier

Filosofi Poddar Samhälle och kultur Teknologi

Förekommer på

Teknik

00:00 -00:00