Evaluation sets for open AI models

Name: Tendril Foundation Public Data Sample
Creator: Tendril
License: https://www.apache.org/licenses/LICENSE-2.0

Published 2026-06-20

Tendril produces evaluation sets that test how well an AI model handles real prompts, scored against human judgments, with full per-row provenance and a per-row open license. The same opt-in browser compute and mobile Tap to Train task that build our preference data also build evaluation sets. A real, license-clean English sample is below, no signup. Request access for the full specification.

Schema

Each row: id, prompt, response_a, response_b, preferred (response_a or response_b), topic, language (ISO code), and nested provenance.source_dataset / provenance.license / provenance.source_row_id.

Sample

A real, license-clean sample, no signup: 14 human-preference rows, English, balanced across science, general, coding, factual_qa, creative, and language topics (7 rows oasst2 Apache-2.0 + 7 rows hh-rlhf helpful-base MIT). The thing we give away proves the thing we sell.

JSONL
JSON
CSV
build-sample.mjs (reproducible) · NOTICE

Provenance and licensing

Licensing is per row: oasst2 rows are Apache-2.0, hh-rlhf helpful-base rows are MIT. Provenance is recorded per row, and the NOTICE must be retained on redistribution. The body copy and NOTICE are the authoritative per-row licence source (schema.org Dataset.license is single-valued, so the JSON-LD shows Apache-2.0 only).

Privacy

Public, consent-clean sources only. On the live network, capture is opt-in and granular, private prompts are never collected, and personal data is redacted before sharing.

Honest scope

This English sample demonstrates format and provenance discipline. It is not itself underserved-language data, and we do not claim it is.

Request access

Talk to us about pricing and access, or commission a preference or evaluation set in your language: hello@tendril.network.