Skip to main content
Evals define what success looks like for a Task. Each eval has a label, a natural language assertion, and a weight that determines its contribution to the overall score. During a Run, an LLM judge evaluates each assertion against the evidence collected from sandbox interactions.
Endpoints not yet in OpenAPI spec.

Endpoints

MethodEndpointDescription
GET/evals?task_id={taskId}List evals for a task
POST/evalsCreate an eval
GET/evals/{id}Get eval details
PATCH/evals/{id}Update an eval
DELETE/evals/{id}Delete an eval

Eval Object

FieldTypeDescription
idstringUnique identifier
task_idstringParent Task
labelstringShort label describing the assertion
assertstringNatural language assertion the LLM judge evaluates
weightnumberWeight for scoring (higher = more important)
organization_idstringParent organization
created_atdatetimeCreation timestamp
updated_atdatetimeLast modification timestamp

SDK Example

// Create an eval
const eval_ = await verial.evals.create({
  taskId: 'task_abc123',
  label: 'Prior auth submitted',
  assert: 'The agent submitted a prior authorization request to the payer',
  weight: 1.0,
})

// List evals for a task
const evals = await verial.evals.list({ taskId: 'task_abc123' })

// Get a specific eval
const details = await verial.evals.get({ id: eval_.id })

// Update
await verial.evals.update({
  id: eval_.id,
  weight: 2.0,
})

// Delete
await verial.evals.delete({ id: eval_.id })