Eval Runs contain the result of a single Eval assertion within a Task Run. The LLM judge evaluates the assertion against evidence collected from sandbox interactions and produces a score and detailed reasoning.
Endpoints not yet in OpenAPI spec.
Endpoints
| Method | Endpoint | Description |
|---|
GET | /eval-runs?task_run_id={taskRunId} | List eval runs for a task run |
GET | /eval-runs/{id} | Get eval run details |
Eval Run Object
| Field | Type | Description |
|---|
id | string | Unique identifier |
task_run_id | string | Parent Task Run |
eval_id | string | Source Eval |
result | string | Pass/fail result |
score | number | Score for this assertion (0-1) |
details | string | null | LLM judge reasoning |
started_at | datetime | When evaluation started |
completed_at | datetime | null | When evaluation finished |
SDK Example
// List eval runs for a task run
const evalRuns = await verial.evalRuns.list({ taskRunId: 'tr_abc123' })
// Get eval run details (includes LLM judge reasoning)
const details = await verial.evalRuns.get({ id: evalRuns.data[0].id })
console.log(`${details.result}: ${details.details}`)