Evals

Evals define what success looks like for a Task. Each eval has a label, a natural language assertion, and a weight that determines its contribution to the overall score. During a Run, an LLM judge evaluates each assertion against the evidence collected from sandbox interactions.

Endpoints not yet in OpenAPI spec.

Endpoints

Method	Endpoint	Description
`GET`	`/evals?task_id={taskId}`	List evals for a task
`POST`	`/evals`	Create an eval
`GET`	`/evals/{id}`	Get eval details
`PATCH`	`/evals/{id}`	Update an eval
`DELETE`	`/evals/{id}`	Delete an eval

Eval Object

Field	Type	Description
`id`	string	Unique identifier
`task_id`	string	Parent Task
`label`	string	Short label describing the assertion
`assert`	string	Natural language assertion the LLM judge evaluates
`weight`	number	Weight for scoring (higher = more important)
`organization_id`	string	Parent organization
`created_at`	datetime	Creation timestamp
`updated_at`	datetime	Last modification timestamp

SDK Example

// Create an eval
const eval_ = await verial.evals.create({
  taskId: 'task_abc123',
  label: 'Prior auth submitted',
  assert: 'The agent submitted a prior authorization request to the payer',
  weight: 1.0,
})

// List evals for a task
const evals = await verial.evals.list({ taskId: 'task_abc123' })

// Get a specific eval
const details = await verial.evals.get({ id: eval_.id })

// Update
await verial.evals.update({
  id: eval_.id,
  weight: 2.0,
})

// Delete
await verial.evals.delete({ id: eval_.id })

Getting Started

SDK

Resources

Sandbox Protocols

Endpoints

Eval Object

SDK Example

Getting Started

SDK

Resources

Sandbox Protocols

​Endpoints

​Eval Object

​SDK Example

Endpoints

Eval Object

SDK Example