Menu
Human-preference data as an open graph

Evaluation sets for open AI models

Tendril produces evaluation sets that test how well an AI model handles real prompts, scored against human judgments, with full per-row provenance and a per-row open license. The same opt-in browser compute and mobile Tap to Train task that build our preference data also build evaluation sets. A real, license-clean English sample is below, no signup. Request access for the full specification.

Schema

Each row: id, prompt, response_a, response_b, preferred (response_a or response_b), topic, language (ISO code), and nested provenance.source_dataset / provenance.license / provenance.source_row_id.

Sample

A real, license-clean sample, no signup: 14 human-preference rows, English, balanced across science, general, coding, factual_qa, creative, and language topics (7 rows oasst2 Apache-2.0 + 7 rows hh-rlhf helpful-base MIT). The thing we give away proves the thing we sell.

Provenance and licensing

Licensing is per row: oasst2 rows are Apache-2.0, hh-rlhf helpful-base rows are MIT. Provenance is recorded per row, and the NOTICE must be retained on redistribution. The body copy and NOTICE are the authoritative per-row licence source (schema.org Dataset.license is single-valued, so the JSON-LD shows Apache-2.0 only).

Privacy

Public, consent-clean sources only. On the live network, capture is opt-in and granular, private prompts are never collected, and personal data is redacted before sharing.

Honest scope

This English sample demonstrates format and provenance discipline. It is not itself underserved-language data, and we do not claim it is.

Request access

Talk to us about pricing and access, or commission a preference or evaluation set in your language: hello@tendril.network.