Human-preference data from real people
Published
Tendril produces human-preference data: a prompt, two AI answers, and a real person's judgment of which is better, with full per-row provenance and a per-row open license. It is collected through opt-in browser compute and a mobile Tap to Train task. A real, license-clean English sample is below, no signup. Request access for the full specification.
Schema
Each row: id, prompt, response_a, response_b, preferred (response_a or response_b), topic, language (ISO code), and nested provenance.source_dataset / provenance.license / provenance.source_row_id.
Sample
A real, license-clean sample, no signup: 14 human-preference rows, English, balanced across science, general, coding, factual_qa, creative, and language topics (7 rows oasst2 Apache-2.0 + 7 rows hh-rlhf helpful-base MIT). The thing we give away proves the thing we sell.
Provenance and licensing
Licensing is per row: oasst2 rows are Apache-2.0, hh-rlhf helpful-base rows are MIT. Provenance is recorded per row, and the NOTICE must be retained on redistribution. The body copy and NOTICE are the authoritative per-row licence source (schema.org Dataset.license is single-valued, so the JSON-LD shows Apache-2.0 only).
Privacy
Public, consent-clean sources only. On the live network, capture is opt-in and granular, private prompts are never collected, and personal data is redacted before sharing.
Honest scope
This English sample demonstrates format and provenance discipline. It is not itself underserved-language data, and we do not claim it is.
Request access
Talk to us about pricing and access, or commission a preference or evaluation set in your language: hello@tendril.network.