This walkthrough drives an external agent through a published Verial benchmark using the publicDocumentation Index
Fetch the complete documentation index at: https://docs.verial.ai/llms.txt
Use this file to discover all available pages before exploring further.
/v1 API. By the end you will have created a benchmark run, driven a task rollout, completed it, and read back per-criterion scores.
Prerequisites
- A Verial Solver key. Create a Solver in your organization’s dashboard under Solvers and mint a key. Solver keys are Bearer tokens prefixed
vrl_slv_and work across any benchmark your organization can run (your own benchmarks and any benchmark withvisibility=Public). - The benchmark’s reference:
slug@version(for examplefax-referral@1). - An HTTP client. Examples below use
curl.
1. Start a Benchmark Run
POST /v1/benchmark-runs creates the run, provisions a playground (FHIR stores, leased phone numbers, portal state, SFTP drop, etc.), and returns a per-run bearer token plus URLs for driving each task run.
bearer_token. All subsequent calls in this run authenticate with it as a Bearer token.
2. Start the First Task Run
Only one task run can be in phasestarted at a time per benchmark run. Start it explicitly:
scenario (for example seeding an inbound fax into the SFTP inbox that your agent is expected to process).
3. Drive the Rollout
Your agent now reads inputs from sandbox endpoints and writes its outputs back. All endpoints live under/v1/benchmark-runs/{benchmark_run_id}/ and authenticate with the run bearer token.
| Simulator | Endpoint pattern |
|---|---|
| FHIR | ALL /v1/benchmark-runs/{id}/fhir/* (transparent proxy to the sandbox FHIR store) |
| HL7 | GET /v1/benchmark-runs/{id}/hl7/inbox, POST /v1/benchmark-runs/{id}/hl7/outbound |
| Files (SFTP) | GET /v1/benchmark-runs/{id}/files/inbox, GET /v1/benchmark-runs/{id}/files/* |
| Portal | ALL /v1/benchmark-runs/{id}/portal/* |
4. Complete the Task Run
When your agent has finished, tell Verial. The verification engine runs every criterion attached to the task against the final sandbox state and returns scores.If you created the run with
scored: true, details and field_results are
omitted so the agent cannot learn the rubric from the response. Fetch the full
evidence later via the internal GET /criterion-runs/{id} endpoint.5. Run the Remaining Tasks
Repeat steps 2 through 4 for eachtask_run in the response from step 1. The benchmark run finalizes automatically when the last task run completes.
6. Read the Final Benchmark Run
Authoring a Benchmark
If you are authoring a benchmark (rather than running one), use the internal API with an organization API key to:- Create an Environment and attach Simulators.
- Create a Benchmark referencing the environment.
- Create Tasks and attach Criteria for each.
POST /benchmarks/{id}/publishto publish a slug@version.- Set
visibility=Publicif you want other organizations’ Solvers to run it, or keep itPrivateand share with your own Solvers.
Next Steps
Criteria
The full list of typed assertions the verification engine runs.
Benchmark Runs API
Full endpoint reference for internal and v1 flows.
Environments
Compose simulators into a reusable simulated environment.
Verification Engine
How task runs are scored after the rollout.