Task Runs represent the outcome of a single Task within a Benchmark Run. Each task run is executed in its own Playground. When completed, the verification engine produces one Criterion Run per task Criterion. There are two completion paths depending on how the benchmark run was created:Documentation Index
Fetch the complete documentation index at: https://docs.verial.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Internal (
POST /task-runs/{id}/complete): used by Verial tooling and workers. - Public v1 (
POST /v1/task-runs/{id}/complete): used by external agents driving a submission. This is the path an external developer uses; see the Quick Start.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /task-runs?benchmark_run_id={benchmark_run_id} | List task runs for a benchmark run |
GET | /task-runs/{id} | Get task run details (includes criterion_runs array) |
POST | /task-runs/{id}/complete | Mark a task run complete (internal) |
POST | /task-runs/{id}/cancel | Cancel a task run |
POST | /v1/task-runs/{id}/start | (Public v1) Mark a task run started |
POST | /v1/task-runs/{id}/complete | (Public v1) Mark complete, runs verification, returns checks |
Task Run Object
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier |
benchmark_run_id | string | Parent Benchmark Run |
task_id | string | Source Task |
playground_id | string | Playground used for execution |
status | string | active, completed, cancelled, failed, timed_out |
phase | string | created, started, completed |
verdict | string | null | pass, partial, fail. Set on completion |
score | number | null | Weighted task score (0 to 1). Set on completion |
snapshot | object | null | Frozen copy of the task at execution time |
started_at | datetime | null | When execution started |
completed_at | datetime | null | When execution finished |
GET /task-runs/{id} response includes a criterion_runs array. See Criterion Runs.
v1 Completion Response
POST /v1/task-runs/{id}/complete runs verification synchronously and returns:
scored: true, the details and field-level evidence are omitted from this response to avoid leaking the scoring rubric. You can still fetch full evidence later via GET /criterion-runs/{id}.