Skip to main content
Verial is a healthcare simulated environment platform. You define environments (simulated EHRs, phone lines, fax, clearinghouses, portals), group tasks into benchmarks, and each task has criteria, typed assertions that the verification engine runs after a rollout to score the task. A criterion is a single typed assertion. It lives on a Task and, after the task run completes, produces one Criterion Run with passed, score, details, and evidence.

How Criteria Work

Unlike the legacy eval approach (a single natural language assert string judged by an LLM), a criterion has a structured assertion object. The verification engine dispatches to a dedicated check implementation keyed by assertion.assert.

Anatomy of a Criterion

FieldDescription
labelShort human-readable description
weightRelative contribution to the task score. The task score is a weighted mean of per-criterion scores
axisOptional scoring axis. Criteria sharing an axis contribute to a per-axis score (for example correctness, safety, efficiency)
input_entity_idOptional DatasetEntity the criterion is scoped to (e.g. “the referral the agent should have processed”)
assertionTyped assertion spec. Discriminated on assert

Supported Checks

Each check is documented in full on the Criteria API reference. A quick tour:

fhir-resource-state

Assert that a FHIR search returns a resource with the expected field values after the rollout.

hl7-structural

Assert field values on HL7v2 outbound messages (ADT, ORU, ORM, SIU).

portal-state-match

Assert that a row in simulated portal state has the expected values after submission.

sftp-file-present

Assert that a file was uploaded to the SFTP endpoint, optionally checking parsed JSON contents.

voice-transcript

Assert that required phrases appear (and forbidden phrases do not) in the call transcript. Phrase matching is LLM-assisted.

x12-response

Assert field values on an X12 EDI response (270/271/276/277/278).

Annotated Examples

FHIR: Appointment booked

{
  "label": "Follow-up appointment booked with Dr. Rivera",
  "weight": 1.0,
  "axis": "correctness",
  "assertion": {
    "assert": "fhir-resource-state",
    "resource_type": "Appointment",
    "search": { "patient": "Patient/john-smith", "status": "booked" },
    "fields": [
      { "path": "participant.0.actor.display", "expected": "Dr. Rivera" }
    ]
  }
}
The engine runs a FHIR search for Appointment?patient=Patient/john-smith&status=booked, then asserts the first result has the expected participant display name.

Voice: required disclosures

{
  "label": "Agent collected member ID and DOB",
  "weight": 0.5,
  "axis": "correctness",
  "assertion": {
    "assert": "voice-transcript",
    "speaker": "agent",
    "contains": ["member ID", "date of birth"],
    "not_contains": ["social security number"]
  }
}
The engine fetches the voice transcript for the task run’s playground and checks the agent’s turns for required and forbidden phrases.

SFTP: claim file uploaded

{
  "label": "Claim file written to outbound/",
  "weight": 1.0,
  "assertion": {
    "assert": "sftp-file-present",
    "path_pattern": "outbound/claims/*.json",
    "parse_json": true,
    "fields": [
      { "path": "claim.patient.member_id", "expected": "BCB123456789" },
      { "path": "claim.total_charges", "expected": 420.00 }
    ]
  }
}

Portal: prior auth submitted

{
  "label": "Prior auth submitted with correct CPT",
  "weight": 1.0,
  "axis": "correctness",
  "assertion": {
    "assert": "portal-state-match",
    "correlate_by": { "resource": "prior_auth_requests", "key": "request_id" },
    "assertions": [
      { "path": "status", "expected": "submitted" },
      { "path": "cpt_code", "expected": "72148" }
    ]
  }
}

HL7: ADT sent

{
  "label": "ADT^A01 admission message sent for correct patient",
  "weight": 1.0,
  "assertion": {
    "assert": "hl7-structural",
    "correlate_by": { "MSH.9.1": "ADT", "MSH.9.2": "A01" },
    "fields": [
      { "path": "PID.5.1", "expected": "Smith" },
      { "path": "PV1.2", "expected": "I" }
    ]
  }
}

X12: 271 eligibility response

{
  "label": "Eligibility 271 returned active coverage",
  "weight": 1.0,
  "assertion": {
    "assert": "x12-response",
    "transaction": "271",
    "fields": [
      { "path": "EB.1", "expected": "1" },
      { "path": "EB.3", "expected": "30" }
    ]
  }
}

Writing Good Criteria

  • Prefer precise field assertions over free-form natural language.
  • Group related criteria under an axis so you can see a per-axis score breakdown in the task score.
  • Weight critical outcomes higher. Verial uses weighted means, so weight: 2 doubles a criterion’s contribution relative to weight: 1.
  • Test negative behaviors too. For example a voice-transcript criterion with not_contains: ["social security number"].
  • Start narrow. One criterion per observable outcome is better than one compound criterion.

Next Steps

Verification

How the verification engine scores criteria into a task score.

Criteria API

REST endpoints and full assertion spec reference.