Sveriges mest populära poddar

AI Safety Fundamentals: Governance

Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

32 min • 4 januari 2025

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators.

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

00:00 -00:00