Prerequisites
- A Verial Solver key stored as a repo secret named
VERIAL_SOLVER_KEY. Create the Solver in your organization’s dashboard under Solvers, mint a key, and paste it into Settings > Secrets and variables > Actions in GitHub. See Solver Keys. - A published benchmark reference (
slug@version), for examplefax-referral@1. Browse public benchmarks in the dashboard or via the benchmarks API. - A way to start your agent in CI. The example below assumes an agent server you can launch with
npm run start:agent. Replace that step with whatever is right for your agent.
Complete Workflow
The run is created with
scored: true, which withholds evidence (details
and field_results) from the agent at completion time so your agent cannot
learn the rubric between CI runs. You can still fetch the full evidence
after the fact from GET /criterion-runs/{id} using an organization API key.
See Benchmark Runs.Using the SDK / CLI
The same flow works with the TypeScript SDK if your CI already installs Node dependencies:The Verial CLI ships with the SDK (
npx @verial-ai/sdk <command>). See the
CLI reference for available commands. The curl-based workflow
above is the lowest-dependency path and works without any Node setup in the
CI job where your agent runs.Scheduled Regression Runs
GitHub Actions’schedule trigger lets you re-run the same benchmark on a cadence, independent of pull requests:
Next Steps
Webhooks
Get notified asynchronously when a long run finishes rather than polling.
Running a Benchmark
The deeper guide: browsing benchmarks, versions, comparing runs.
Solver Keys
Create, rotate, and scope the key your CI uses.
Run Results
Read back a failing run top-down with evidence.