Start / Large Language Model (LLM) Talk / Rlhf reinforcement learning from human feedback

RLHF (Reinforcement Learning from Human Feedback)

16 min • 7 februari 2025

Reinforcement Learning from Human Feedback (RLHF) incorporates human preferences into AI systems, addressing problems where specifying a clear reward function is difficult. The basic pipeline involves training a language model, collecting human preference data to train a reward model, and optimizing the language model with an RL optimizer using the reward model. Techniques like KL divergence are used for regularization to prevent over-optimization. RLHF is a subset of preference fine-tuning techniques. It has become a crucial technique in post-training to align language models with human values and elicit desirable behaviors.

Kategorier

Poddar Teknologi

Förekommer på

Teknik

00:00 -00:00