Verial is a healthcare simulated environment platform. You define environments (simulated EHRs, phone lines, fax, clearinghouses, portals), group tasks into benchmarks, and each task has criteria, typed assertions that the verification engine runs after a rollout to score the task. A criterion is a single typed assertion. It lives on a Task and, after the task run completes, produces one Criterion Run withDocumentation Index
Fetch the complete documentation index at: https://docs.verial.ai/llms.txt
Use this file to discover all available pages before exploring further.
passed, score, details, and evidence.
How Criteria Work
Unlike the legacyeval approach (a single natural language assert string judged by an LLM), a criterion has a structured assertion object. The verification engine dispatches to a dedicated check implementation keyed by assertion.assert.
Anatomy of a Criterion
| Field | Description |
|---|---|
label | Short human-readable description |
weight | Relative contribution to the task score. The task score is a weighted mean of per-criterion scores |
axis | Optional scoring axis. Criteria sharing an axis contribute to a per-axis score (for example correctness, safety, efficiency) |
input_entity_id | Optional DatasetEntity the criterion is scoped to (e.g. “the referral the agent should have processed”) |
assertion | Typed assertion spec. Discriminated on assert |
Supported Checks
Each check is documented in full on the Criteria API reference. A quick tour:fhir-resource-state
Assert that a FHIR search returns a resource with the expected field values after the rollout.
hl7-structural
Assert field values on HL7v2 outbound messages (ADT, ORU, ORM, SIU).
portal-state-match
Assert that a row in simulated portal state has the expected values after submission.
sftp-file-present
Assert that a file was uploaded to the SFTP endpoint, optionally checking parsed JSON contents.
voice-transcript
Assert that required phrases appear (and forbidden phrases do not) in the call transcript. Phrase matching is LLM-assisted.
x12-response
Assert field values on an X12 EDI response (270/271/276/277/278).
Annotated Examples
FHIR: Appointment booked
Appointment?patient=Patient/john-smith&status=booked, then asserts the first result has the expected participant display name.
Voice: required disclosures
SFTP: claim file uploaded
Portal: prior auth submitted
HL7: ADT sent
X12: 271 eligibility response
Writing Good Criteria
- Prefer precise field assertions over free-form natural language.
- Group related criteria under an
axisso you can see a per-axis score breakdown in the task score. - Weight critical outcomes higher. Verial uses weighted means, so
weight: 2doubles a criterion’s contribution relative toweight: 1. - Test negative behaviors too. For example a
voice-transcriptcriterion withnot_contains: ["social security number"]. - Start narrow. One criterion per observable outcome is better than one compound criterion.
Next Steps
Verification
How the verification engine scores criteria into a task score.
Criteria API
REST endpoints and full assertion spec reference.