Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.verial.ai/llms.txt

Use this file to discover all available pages before exploring further.

Task Runs represent the outcome of a single Task within a Benchmark Run. Each task run is executed in its own Playground. When completed, the verification engine produces one Criterion Run per task Criterion. There are two completion paths depending on how the benchmark run was created:
  • Internal (POST /task-runs/{id}/complete): used by Verial tooling and workers.
  • Public v1 (POST /v1/task-runs/{id}/complete): used by external agents driving a submission. This is the path an external developer uses; see the Quick Start.

Endpoints

MethodEndpointDescription
GET/task-runs?benchmark_run_id={benchmark_run_id}List task runs for a benchmark run
GET/task-runs/{id}Get task run details (includes criterion_runs array)
POST/task-runs/{id}/completeMark a task run complete (internal)
POST/task-runs/{id}/cancelCancel a task run
POST/v1/task-runs/{id}/start(Public v1) Mark a task run started
POST/v1/task-runs/{id}/complete(Public v1) Mark complete, runs verification, returns checks

Task Run Object

FieldTypeDescription
idstringUnique identifier
benchmark_run_idstringParent Benchmark Run
task_idstringSource Task
playground_idstringPlayground used for execution
statusstringactive, completed, cancelled, failed, timed_out
phasestringcreated, started, completed
verdictstring | nullpass, partial, fail. Set on completion
scorenumber | nullWeighted task score (0 to 1). Set on completion
snapshotobject | nullFrozen copy of the task at execution time
started_atdatetime | nullWhen execution started
completed_atdatetime | nullWhen execution finished
The GET /task-runs/{id} response includes a criterion_runs array. See Criterion Runs.

v1 Completion Response

POST /v1/task-runs/{id}/complete runs verification synchronously and returns:
{
  "task_run_id": "tr_abc123",
  "phase": "completed",
  "verdict": "partial",
  "score": 0.66,
  "axes": {
    "correctness": { "score": 1.0, "weight": 2 },
    "safety": { "score": 0.0, "weight": 1 }
  },
  "checks": [
    {
      "criterion_id": "crit_01H...",
      "label": "Appointment booked with correct provider",
      "result": "pass",
      "score": 1.0,
      "axis": "correctness",
      "details": "Appointment resource matched; all field assertions passed.",
      "field_results": [
        { "path": "participant.0.actor.display", "expected": "Dr. Rivera", "actual": "Dr. Rivera", "passed": true }
      ]
    }
  ]
}
When the benchmark run was created with scored: true, the details and field-level evidence are omitted from this response to avoid leaking the scoring rubric. You can still fetch full evidence later via GET /criterion-runs/{id}.

SDK Example

// List task runs for a benchmark run
const taskRuns = await verial.taskRuns.list({ benchmarkRunId: "br_abc123" });

// Get a task run with its criterion runs
const detail = await verial.taskRuns.get({ id: taskRuns.data[0].id });

for (const run of detail.criterionRuns) {
  console.log(`${run.criterionId}: ${run.passed ? "PASS" : "FAIL"} (${run.score})`);
}