The MCP server currently still exposes the legacy
eval-runs
tool. The payload shape is equivalent (passed, score, details), except a
Criterion Run includes structured evidence. A dedicated criterionRuns tool
is planned.Actions (planned)
| Action | Description |
|---|---|
list | List all criterion runs for a task run |
get | Get a criterion run by ID |
Parameters
list
| Parameter | Type | Required | Description |
|---|---|---|---|
taskRunId | string | yes | Task run ID to list criterion runs for |
get
| Parameter | Type | Required | Description |
|---|---|---|---|
id | string | yes | Criterion run ID |