Start / AI Safety Fundamentals: Governance / Problems and fundamental limitations of reinforcement learning from human feedback

Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

32 min • 4 januari 2025

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators.

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Kategorier

Filosofi Poddar Samhälle och kultur Teknologi

Förekommer på

Teknik

00:00 -00:00