Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.verial.ai/llms.txt

Use this file to discover all available pages before exploring further.

For most simulation workflows, follow this progression:
  1. Setupsimulators to define interfaces, environments to compose them, datasets to prepare patient data
  2. Definebenchmarks to create test suites, tasks to add test cases, criteria to add assertions
  3. Executebenchmark_runs to start runs, poll with get until status is Completed
  4. Analyzetask-runs to see per-task results, criterion-runs to see per-criterion reasoning and scores
If the user already has an environment or benchmark ID, skip the setup or definition steps and go straight to execution.

Tool Chaining Patterns

Simulators exist independently from environments. Create them first, then attach them.
// 1. Create simulator
{ "action": "create", "type": "FHIR", "name": "EHR" }
// Response: { "id": "sim_01", ... }

// 2. Create environment
{ "action": "create", "name": "Clinic" }
// Response: { "id": "env_01", ... }

// 3. Link
{ "action": "addSimulator", "id": "env_01", "simulatorId": "sim_01" }
This pattern keeps simulators reusable across multiple environments. A single FHIR simulator definition can be linked to different environment configurations.

Benchmark definition

Benchmarks, tasks, and criteria form a hierarchy. Create them top-down.
// 1. Benchmark
{ "action": "create", "name": "PA Tests", "environmentId": "env_01", "timeout": 300 }
// Response: { "id": "bm_01", ... }

// 2. Task
{ "action": "create", "benchmarkId": "bm_01", "name": "Submit PA", "instruction": "..." }
// Response: { "id": "task_01", ... }

// 3. Criterion
{ "action": "create", "taskId": "task_01", "name": "pa-submitted", "assertion": { ... }, "weight": 1.0 }
Each level references the parent by ID. Criteria are always attached to a specific task.

Run and poll

Start a run, poll for completion, then drill into results.
// 1. Start
{ "action": "create", "benchmarkId": "bm_01" }
// Response: { "id": "run_01", "status": "Running" }

// 2. Poll (repeat until status is "Completed" or "Failed")
{ "action": "get", "id": "run_01" }
// Response: { "status": "Completed", "score": 0.92, "verdict": "Pass" }

// 3. Drill into task results
{ "action": "list", "runId": "run_01" }

// 4. Drill into criterion results
{ "action": "list", "taskRunId": "tr_01" }
The benchmark_runs get response includes the overall score and verdict. Use task-runs and criterion-runs to understand which specific checks passed or failed.

Writing Good Criteria

When writing criteria, describe observable outcomes the verification engine can check against sandbox state. Write them like test assertions: specific, observable, and unambiguous.
AssertQualityWhy
”The agent did a good job”BadSubjective, no observable criteria
”A prior auth was submitted”OkayObservable but vague about what counts as “submitted"
"A prior authorization request was submitted through the payer portal with CPT code 72148”GoodSpecific action, specific channel, specific data point
”The 271 eligibility response shows active coverage with plan type PPO”GoodSpecific transaction type, specific fields to check
”The agent called the FHIR endpoint GET /Patient and received a 200 response”GoodVerifiable against interaction logs
”The agent handled the error gracefully”Bad”Gracefully” is subjective
Use weights to distinguish critical checks from nice-to-haves. A weight of 1.0 means the criterion is essential to the task. A weight of 0.5 or lower signals a secondary validation that improves the score but does not determine the verdict on its own.

One assertion per criterion

Split compound checks into separate criteria rather than combining them. This gives you granular scoring and clearer failure messages.

Error Handling

All tools return errors in a consistent shape:
{
  "error": "not_found",
  "message": "Environment env_xyz not found"
}
Error CodeMeaningRecommended Action
not_foundEntity does not exist or belongs to a different organizationVerify the ID; use a list action to find valid IDs
validation_errorInvalid parameters (missing required field, wrong type)Check the parameter types and required fields in the tool reference
conflictDuplicate or conflicting state (e.g., simulator already linked)Use get to check current state before retrying
timeoutRun exceeded the benchmark timeoutIncrease the benchmark timeout value or simplify the task
When a not_found error occurs, do not retry with the same ID. Use the list action on the parent resource to discover valid IDs. For example, if a tasks get returns not_found, call tasks list with the benchmarkId to see available tasks.

Next Steps

Tools Reference

Full parameter documentation for each tool.

Workflow Examples

Step-by-step tool call sequences for common simulation tasks.