Menu

RLHF (reinforcement learning from human feedback)

RLHF, or reinforcement learning from human feedback, is a training method where human judgments of a model's outputs are used to align the model with what people prefer. The human-judgment part needs no coding: anyone can compare answers and pick the better one. That preference data is what researchers later use to align the model.

Learn more: Tap to Train.

Related terms

All glossary terms · What is Tendril