Skip to main content
The verial task-runs commands read Task Runs — per-task results produced during a benchmark run. Each task run holds a frozen snapshot of the task, plus per-criterion outcomes and evidence. Task runs are created by the platform when a benchmark run executes; they are not created directly. Use these commands to drill into failures after a run finishes. Authentication is required. See verial auth.

Subcommands

CommandDescription
verial task-runs listList task runs for a specific benchmark run.
verial task-runs getGet a single task run by ID.

verial task-runs list

List all task runs belonging to a benchmark run, in task order. Synopsis:
verial task-runs list --benchmark-run-id <id>
Options:
FlagDescriptionRequired
--benchmark-run-id <id>Benchmark run whose task runs to list.Yes
Example:
verial task-runs list --benchmark-run-id run_cm456001
ID                  STATUS      VERDICT   SCORE
tr_cm789001         completed   pass      1.000
tr_cm789002         completed   pass      0.750
tr_cm789003         completed   fail      0.333

verial task-runs get

Fetch a single task run. The response includes the task snapshot, per-criterion results, and any recorded interactions (voice transcripts, FHIR request logs, fax documents). Synopsis:
verial task-runs get --id <id>
Example:
verial task-runs get --id tr_cm789003
id            tr_cm789003
status        completed
verdict       fail
score         0.333
benchmarkRun  run_cm456001
Pair with --json to get the full task snapshot and criterion results:
verial task-runs get --id tr_cm789003 --json | jq '.criterionResults'
REST equivalent: Task Runs.

Drilling into a failed run

# List task runs for the failed benchmark run
verial task-runs list --benchmark-run-id run_cm456001 --json \
  | jq -r '.data[] | select(.verdict == "fail") | .id' \
  | while read tr; do
      verial task-runs get --id "$tr" --json \
        | jq '{id, verdict, criteria: .criterionResults}'
    done

Next Steps

Task Runs

Task run model, criterion results, and interactions.

verial benchmark-runs

Start and wait on the benchmark run that produces these task runs.