Documentation Index
Fetch the complete documentation index at: https://docs.verial.ai/llms.txt
Use this file to discover all available pages before exploring further.
A Benchmark Run is a single execution of a Benchmark. When you create a benchmark run, Verial provisions a Playground for each task, your agent drives the rollouts through sandbox endpoints, and the verification engine scores each task into Criterion Runs and an aggregate score and verdict.
There are two entry points:
- Internal (
/benchmark-runs): Bearer API-key auth. Used by Verial tooling.
- Public v1 (
/v1/benchmark-runs): Solver-key auth. Used by external developers running a published benchmark from their own organization’s Solver. This is the recommended path; see the Quick Start and Authentication.
Internal Endpoints
| Method | Endpoint | Description |
|---|
GET | /benchmark-runs?benchmark_id={benchmark_id} | List benchmark runs |
POST | /benchmark-runs | Create a benchmark run (body { "benchmark_id": "..." }) |
GET | /benchmark-runs/{id} | Get run details (includes task runs and criterion runs) |
POST | /benchmark-runs/{id}/complete | Mark complete |
POST | /benchmark-runs/{id}/cancel | Cancel |
POST | /benchmark-runs/{id}/publish | Publish to leaderboard |
POST | /benchmark-runs/{id}/unpublish | Unpublish from leaderboard |
Public v1 Endpoints
| Method | Endpoint | Description |
|---|
POST | /v1/benchmark-runs | Create a run. Body: { "benchmark": "slug@version", "scored": boolean }. Returns bearer token, task-run URLs, and sandbox endpoint paths |
GET | /v1/benchmark-runs/{id} | Get run summary (requires the run’s bearer token) |
Benchmark Run Object
| Field | Type | Description |
|---|
id | string | Unique identifier |
benchmark_id | string | Parent Benchmark |
status | string | active, completed, cancelled, failed |
phase | string | created, started, completed |
scored | boolean | true if evidence is withheld from the agent during completion responses |
verdict | string | null | pass, partial, fail |
score | number | null | Aggregate score (mean of task scores) |
agent | string | null | Optional agent identifier |
started_at | datetime | null | |
completed_at | datetime | null | |
v1 Create Response
POST /v1/benchmark-runs returns everything the agent needs to drive the rollouts:
{
"benchmark_run_id": "br_abc123",
"benchmark": { "slug": "fax-referral", "version": "1", "name": "Fax referral intake" },
"scored": false,
"phase": "created",
"bearer_token": "vrl_run_...",
"bearer_token_expires_at": "2026-04-21T17:00:00.000Z",
"endpoints": {
"files_inbox": "/v1/benchmark-runs/br_abc123/files/inbox",
"hl7_outbound": "/v1/benchmark-runs/br_abc123/hl7/outbound"
},
"task_runs": [
{
"id": "tr_1",
"task_id": "task_1",
"name": "Process referral #1",
"phase": "created",
"start_url": "/v1/task-runs/tr_1/start",
"complete_url": "/v1/task-runs/tr_1/complete"
}
]
}
Use the bearer_token to authenticate calls to all /v1/task-runs/* and /v1/benchmark-runs/{id}/* protocol endpoints (FHIR proxy, HL7, files, portal).
Internal SDK Example
const run = await verial.benchmarkRuns.create({ benchmarkId: "bench_abc123" });
const detail = await verial.benchmarkRuns.get({ id: run.id });
console.log(detail.score, detail.verdict);