RLHF (reinforcement learning from human feedback)

Published 2026-06-20

RLHF, or reinforcement learning from human feedback, is a training method where human judgments of a model's outputs are used to align the model with what people prefer. The human-judgment part needs no coding: anyone can compare answers and pick the better one. That preference data is what researchers later use to align the model.

Learn more: Tap to Train.

Related terms

Human feedback (in AI)
Human-Feedback DePIN
Decentralized AI (DeAI)

All glossary terms · What is Tendril