Task Runs - Verial

Task Runs represent the outcome of a single Task within a Benchmark Run. Each task run is executed in its own Playground. When completed, the verification engine produces one Criterion Run per task Criterion. There are two completion paths depending on how the benchmark run was created:

Internal (POST /task-runs/{id}/complete): used by Verial tooling and workers.
Public v1 (POST /v1/task-runs/{id}/complete): used by external agents driving a submission. This is the path an external developer uses; see the Quick Start.

Endpoints

Method	Endpoint	Description
`GET`	`/task-runs?benchmark_run_id={benchmark_run_id}`	List task runs for a benchmark run
`GET`	`/task-runs/{id}`	Get task run details (includes `criterion_runs` array)
`POST`	`/task-runs/{id}/complete`	Mark a task run complete (internal)
`POST`	`/task-runs/{id}/cancel`	Cancel a task run
`POST`	`/v1/task-runs/{id}/start`	(Public v1) Mark a task run started
`POST`	`/v1/task-runs/{id}/complete`	(Public v1) Mark complete, runs verification, returns checks

Task Run Object

Field	Type	Description
`id`	string	Unique identifier
`benchmark_run_id`	string	Parent Benchmark Run
`task_id`	string	Source Task
`playground_id`	string	Playground used for execution
`status`	string	`active`, `completed`, `cancelled`, `failed`, `timed_out`
`phase`	string	`created`, `started`, `completed`
`verdict`	string \| null	`pass`, `partial`, `fail`. Set on completion
`score`	number \| null	Weighted task score (0 to 1). Set on completion
`snapshot`	object \| null	Frozen copy of the task at execution time
`started_at`	datetime \| null	When execution started
`completed_at`	datetime \| null	When execution finished

The GET /task-runs/{id} response includes a criterion_runs array. See Criterion Runs.

v1 Completion Response

POST /v1/task-runs/{id}/complete runs verification synchronously and returns:

{
  "task_run_id": "tr_abc123",
  "phase": "completed",
  "verdict": "partial",
  "score": 0.66,
  "axes": {
    "correctness": { "score": 1.0, "weight": 2 },
    "safety": { "score": 0.0, "weight": 1 }
  },
  "checks": [
    {
      "criterion_id": "crit_01H...",
      "label": "Appointment booked with correct provider",
      "result": "pass",
      "score": 1.0,
      "axis": "correctness",
      "details": "Appointment resource matched; all field assertions passed.",
      "field_results": [
        { "path": "participant.0.actor.display", "expected": "Dr. Rivera", "actual": "Dr. Rivera", "passed": true }
      ]
    }
  ]
}

When the benchmark run was created with scored: true, the details and field-level evidence are omitted from this response to avoid leaking the scoring rubric. You can still fetch full evidence later via GET /criterion-runs/{id}.

SDK Example

// List task runs for a benchmark run
const taskRuns = await verial.taskRuns.list({ benchmarkRunId: "br_abc123" });

// Get a task run with its criterion runs
const detail = await verial.taskRuns.get({ id: taskRuns.data[0].id });

for (const run of detail.criterionRuns) {
  console.log(`${run.criterionId}: ${run.passed ? "PASS" : "FAIL"} (${run.score})`);
}

​Endpoints

​Task Run Object

​v1 Completion Response

​SDK Example

Endpoints

Task Run Object

v1 Completion Response

SDK Example