Tasks are individual test cases within a Benchmark. Each task defines an instruction for the agent, optional trigger conditions, and a set of Evals that determine success or failure.
Endpoints not yet in OpenAPI spec.
Endpoints
| Method | Endpoint | Description |
|---|
GET | /tasks?benchmark_id={benchmarkId} | List tasks for a benchmark |
POST | /tasks | Create a task |
GET | /tasks/{id} | Get task details |
PATCH | /tasks/{id} | Update a task |
DELETE | /tasks/{id} | Delete a task |
Task Object
| Field | Type | Description |
|---|
id | string | Unique identifier |
benchmark_id | string | Parent Benchmark |
name | string | Task name |
instruction | string | null | Natural language instruction for the agent |
timeout | number | null | Task-level timeout override in seconds |
trigger | object | null | Conditions that start the task |
tags | string[] | null | Tags for filtering and grouping |
organization_id | string | Parent organization |
created_at | datetime | Creation timestamp |
updated_at | datetime | Last modification timestamp |
SDK Example
// Create a task
const task = await verial.tasks.create({
benchmarkId: 'bench_abc123',
name: 'Submit prior auth for MRI',
instruction: 'Submit a prior authorization request for a knee MRI',
tags: ['prior-auth', 'imaging'],
})
// List tasks for a benchmark
const tasks = await verial.tasks.list({ benchmarkId: 'bench_abc123' })
// Get a specific task
const details = await verial.tasks.get({ id: task.id })
// Update
await verial.tasks.update({
id: task.id,
timeout: 120,
})
// Delete
await verial.tasks.delete({ id: task.id })