Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.verial.ai/llms.txt

Use this file to discover all available pages before exploring further.

This walkthrough drives an external agent through a published Verial benchmark using the public /v1 API. By the end you will have created a benchmark run, driven a task rollout, completed it, and read back per-criterion scores.

Prerequisites

  • A Verial Solver key. Create a Solver in your organization’s dashboard under Solvers and mint a key. Solver keys are Bearer tokens prefixed vrl_slv_ and work across any benchmark your organization can run (your own benchmarks and any benchmark with visibility=Public).
  • The benchmark’s reference: slug@version (for example fax-referral@1).
  • An HTTP client. Examples below use curl.
export VERIAL_SOLVER_KEY=vrl_slv_xxx
export BENCHMARK_REF=fax-referral@1

1. Start a Benchmark Run

POST /v1/benchmark-runs creates the run, provisions a playground (FHIR stores, leased phone numbers, portal state, SFTP drop, etc.), and returns a per-run bearer token plus URLs for driving each task run.
curl -X POST https://api.verial.ai/v1/benchmark-runs \
  -H "Authorization: Bearer $VERIAL_SOLVER_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"benchmark\": \"$BENCHMARK_REF\", \"scored\": false}"
Response (truncated):
{
  "benchmark_run_id": "br_01H...",
  "benchmark": { "slug": "fax-referral", "version": "1", "name": "Fax referral intake" },
  "scored": false,
  "phase": "created",
  "bearer_token": "vrl_run_...",
  "bearer_token_expires_at": "2026-04-21T17:00:00.000Z",
  "endpoints": {
    "files_inbox": "/v1/benchmark-runs/br_01H.../files/inbox",
    "hl7_outbound": "/v1/benchmark-runs/br_01H.../hl7/outbound"
  },
  "task_runs": [
    {
      "id": "tr_01H...",
      "task_id": "task_01H...",
      "name": "Process referral #1",
      "phase": "created",
      "start_url": "/v1/task-runs/tr_01H.../start",
      "complete_url": "/v1/task-runs/tr_01H.../complete"
    }
  ]
}
Save bearer_token. All subsequent calls in this run authenticate with it as a Bearer token.
export RUN_TOKEN=vrl_run_...
export BENCHMARK_RUN_ID=br_01H...
export TASK_RUN_ID=tr_01H...

2. Start the First Task Run

Only one task run can be in phase started at a time per benchmark run. Start it explicitly:
curl -X POST "https://api.verial.ai/v1/task-runs/$TASK_RUN_ID/start" \
  -H "Authorization: Bearer $RUN_TOKEN"
Starting a task run may execute its scenario (for example seeding an inbound fax into the SFTP inbox that your agent is expected to process).

3. Drive the Rollout

Your agent now reads inputs from sandbox endpoints and writes its outputs back. All endpoints live under /v1/benchmark-runs/{benchmark_run_id}/ and authenticate with the run bearer token.
SimulatorEndpoint pattern
FHIRALL /v1/benchmark-runs/{id}/fhir/* (transparent proxy to the sandbox FHIR store)
HL7GET /v1/benchmark-runs/{id}/hl7/inbox, POST /v1/benchmark-runs/{id}/hl7/outbound
Files (SFTP)GET /v1/benchmark-runs/{id}/files/inbox, GET /v1/benchmark-runs/{id}/files/*
PortalALL /v1/benchmark-runs/{id}/portal/*
Example: pull files from the sandbox inbox.
curl "https://api.verial.ai/v1/benchmark-runs/$BENCHMARK_RUN_ID/files/inbox" \
  -H "Authorization: Bearer $RUN_TOKEN"
Example: send an HL7v2 ADT message as the agent’s output.
curl -X POST "https://api.verial.ai/v1/benchmark-runs/$BENCHMARK_RUN_ID/hl7/outbound" \
  -H "Authorization: Bearer $RUN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"message": "MSH|^~\\&|...|ADT^A01|..."}'
Example: query the FHIR store.
curl "https://api.verial.ai/v1/benchmark-runs/$BENCHMARK_RUN_ID/fhir/Patient?identifier=BCB123456789" \
  -H "Authorization: Bearer $RUN_TOKEN"

4. Complete the Task Run

When your agent has finished, tell Verial. The verification engine runs every criterion attached to the task against the final sandbox state and returns scores.
curl -X POST "https://api.verial.ai/v1/task-runs/$TASK_RUN_ID/complete" \
  -H "Authorization: Bearer $RUN_TOKEN"
Response:
{
  "task_run_id": "tr_01H...",
  "phase": "completed",
  "verdict": "partial",
  "score": 0.66,
  "axes": {
    "correctness": { "score": 1.0, "weight": 2 },
    "safety": { "score": 0.0, "weight": 1 }
  },
  "checks": [
    {
      "criterion_id": "crit_01H...",
      "label": "Referral recorded with correct payer",
      "result": "pass",
      "score": 1.0,
      "axis": "correctness",
      "details": "portal_state_match passed on 1 assertion",
      "field_results": [
        { "path": "payer", "expected": "BlueCross", "actual": "BlueCross", "passed": true }
      ]
    },
    {
      "criterion_id": "crit_01H...",
      "label": "No PHI in outbound fax cover sheet",
      "result": "fail",
      "score": 0.0,
      "axis": "safety"
    }
  ]
}
If you created the run with scored: true, details and field_results are omitted so the agent cannot learn the rubric from the response. Fetch the full evidence later via the internal GET /criterion-runs/{id} endpoint.

5. Run the Remaining Tasks

Repeat steps 2 through 4 for each task_run in the response from step 1. The benchmark run finalizes automatically when the last task run completes.

6. Read the Final Benchmark Run

curl "https://api.verial.ai/v1/benchmark-runs/$BENCHMARK_RUN_ID" \
  -H "Authorization: Bearer $RUN_TOKEN"
{
  "benchmark_run_id": "br_01H...",
  "benchmark": { "slug": "fax-referral", "version": "1", "name": "Fax referral intake" },
  "scored": false,
  "phase": "completed",
  "status": "completed",
  "verdict": "partial",
  "score": 0.72,
  "task_runs": [
    { "id": "tr_01H...", "phase": "completed", "status": "completed", "verdict": "pass", "score": 0.9 }
  ]
}

Authoring a Benchmark

If you are authoring a benchmark (rather than running one), use the internal API with an organization API key to:
  1. Create an Environment and attach Simulators.
  2. Create a Benchmark referencing the environment.
  3. Create Tasks and attach Criteria for each.
  4. POST /benchmarks/{id}/publish to publish a slug@version.
  5. Set visibility=Public if you want other organizations’ Solvers to run it, or keep it Private and share with your own Solvers.
See the Criteria concept page for the six supported check types and the assertion specs they accept.

Next Steps

Criteria

The full list of typed assertions the verification engine runs.

Benchmark Runs API

Full endpoint reference for internal and v1 flows.

Environments

Compose simulators into a reusable simulated environment.

Verification Engine

How task runs are scored after the rollout.