Open Human-Feedback Datasets for Underserved Languages
Tendril produces human-preference and evaluation datasets for low-resource and underserved languages: a prompt, two AI answers, and a real person's judgment of which is better, with full per-row provenance and a per-row open license. A license-clean English sample is below to download, no signup. Per-language sets are prospective and built on commission. Request access for the full specification.
Data types
- Human-preference data from real people
- Multilingual preference data for low-resource languages
- Evaluation sets for open AI models
Download the sample
A real, license-clean sample, no signup: 14 human-preference rows, English, balanced across science, general, coding, factual_qa, creative, and language topics (7 rows oasst2 Apache-2.0 + 7 rows hh-rlhf helpful-base MIT).
Languages
The mission is human-feedback data for the languages most AI models handle poorly. See the languages we help AI understand for who contributes and how.
Request access
Talk to us about pricing and access, or commission a preference or evaluation set in your language: hello@tendril.network.
Dataset FAQ
- What format?
- Each row is a prompt, two AI answers, and a human's preferred choice, with topic, language, and nested per-row provenance (source dataset, license, source row id). We ship the same data as JSONL, JSON, and CSV.
- What languages?
- The public sample is English and proves the format and provenance discipline. The mission is human-feedback data for low-resource and underserved languages; per-language sets are prospective and built on commission, never claimed before they exist.
- Is the data license-clean?
- Yes. The sample is built only from public, openly licensed sources, with the license recorded per row (Apache-2.0 for oasst2 rows, MIT for hh-rlhf helpful-base rows) and a NOTICE that must be retained on redistribution.