Skip to main content
Eval Runs contain the result of a single Eval assertion within a Task Run. The LLM judge evaluates the assertion against evidence collected from sandbox interactions and produces a score and detailed reasoning.
Endpoints not yet in OpenAPI spec.

Endpoints

MethodEndpointDescription
GET/eval-runs?task_run_id={taskRunId}List eval runs for a task run
GET/eval-runs/{id}Get eval run details

Eval Run Object

FieldTypeDescription
idstringUnique identifier
task_run_idstringParent Task Run
eval_idstringSource Eval
resultstringPass/fail result
scorenumberScore for this assertion (0-1)
detailsstring | nullLLM judge reasoning
started_atdatetimeWhen evaluation started
completed_atdatetime | nullWhen evaluation finished

SDK Example

// List eval runs for a task run
const evalRuns = await verial.evalRuns.list({ taskRunId: 'tr_abc123' })

// Get eval run details (includes LLM judge reasoning)
const details = await verial.evalRuns.get({ id: evalRuns.data[0].id })

console.log(`${details.result}: ${details.details}`)