Criteria

Verial is a healthcare simulated environment platform. You define environments (simulated EHRs, phone lines, fax, clearinghouses, portals), group tasks into benchmarks, and each task has criteria, typed assertions that the verification engine runs after a rollout to score the task. A criterion is a single typed assertion. It lives on a Task and, after the task run completes, produces one Criterion Run with passed, score, details, and evidence.

How Criteria Work

Unlike the legacy eval approach (a single natural language assert string judged by an LLM), a criterion has a structured assertion object. The verification engine dispatches to a dedicated check implementation keyed by assertion.assert.

Anatomy of a Criterion

Field	Description
`label`	Short human-readable description
`weight`	Relative contribution to the task score. The task score is a weighted mean of per-criterion scores
`axis`	Optional scoring axis. Criteria sharing an axis contribute to a per-axis score (for example `correctness`, `safety`, `efficiency`)
`input_entity_id`	Optional DatasetEntity the criterion is scoped to (e.g. “the referral the agent should have processed”)
`assertion`	Typed assertion spec. Discriminated on `assert`

Supported Checks

Each check is documented in full on the Criteria API reference. A quick tour:

fhir-resource-state

Assert that a FHIR search returns a resource with the expected field values after the rollout.

hl7-structural

Assert field values on HL7v2 outbound messages (ADT, ORU, ORM, SIU).

portal-state-match

Assert that a row in simulated portal state has the expected values after submission.

sftp-file-present

Assert that a file was uploaded to the SFTP endpoint, optionally checking parsed JSON contents.

voice-transcript

Assert that required phrases appear (and forbidden phrases do not) in the call transcript. Phrase matching is LLM-assisted.

x12-response

Assert field values on an X12 EDI response (270/271/276/277/278).

Annotated Examples

FHIR: Appointment booked

{
  "label": "Follow-up appointment booked with Dr. Rivera",
  "weight": 1.0,
  "axis": "correctness",
  "assertion": {
    "assert": "fhir-resource-state",
    "resource_type": "Appointment",
    "search": { "patient": "Patient/john-smith", "status": "booked" },
    "fields": [
      { "path": "participant.0.actor.display", "expected": "Dr. Rivera" }
    ]
  }
}

The engine runs a FHIR search for Appointment?patient=Patient/john-smith&status=booked, then asserts the first result has the expected participant display name.

Voice: required disclosures

{
  "label": "Agent collected member ID and DOB",
  "weight": 0.5,
  "axis": "correctness",
  "assertion": {
    "assert": "voice-transcript",
    "speaker": "agent",
    "contains": ["member ID", "date of birth"],
    "not_contains": ["social security number"]
  }
}

The engine fetches the voice transcript for the task run’s playground and checks the agent’s turns for required and forbidden phrases.

SFTP: claim file uploaded

{
  "label": "Claim file written to outbound/",
  "weight": 1.0,
  "assertion": {
    "assert": "sftp-file-present",
    "path_pattern": "outbound/claims/*.json",
    "parse_json": true,
    "fields": [
      { "path": "claim.patient.member_id", "expected": "BCB123456789" },
      { "path": "claim.total_charges", "expected": 420.00 }
    ]
  }
}

Portal: prior auth submitted

{
  "label": "Prior auth submitted with correct CPT",
  "weight": 1.0,
  "axis": "correctness",
  "assertion": {
    "assert": "portal-state-match",
    "correlate_by": { "resource": "prior_auth_requests", "key": "request_id" },
    "assertions": [
      { "path": "status", "expected": "submitted" },
      { "path": "cpt_code", "expected": "72148" }
    ]
  }
}

HL7: ADT sent

{
  "label": "ADT^A01 admission message sent for correct patient",
  "weight": 1.0,
  "assertion": {
    "assert": "hl7-structural",
    "correlate_by": { "MSH.9.1": "ADT", "MSH.9.2": "A01" },
    "fields": [
      { "path": "PID.5.1", "expected": "Smith" },
      { "path": "PV1.2", "expected": "I" }
    ]
  }
}

X12: 271 eligibility response

{
  "label": "Eligibility 271 returned active coverage",
  "weight": 1.0,
  "assertion": {
    "assert": "x12-response",
    "transaction": "271",
    "fields": [
      { "path": "EB.1", "expected": "1" },
      { "path": "EB.3", "expected": "30" }
    ]
  }
}

Writing Good Criteria

Prefer precise field assertions over free-form natural language.
Group related criteria under an axis so you can see a per-axis score breakdown in the task score.
Weight critical outcomes higher. Verial uses weighted means, so weight: 2 doubles a criterion’s contribution relative to weight: 1.
Test negative behaviors too. For example a voice-transcript criterion with not_contains: ["social security number"].
Start narrow. One criterion per observable outcome is better than one compound criterion.

Getting Started

Core Concepts

Simulators

Observability

Guides

How Criteria Work

Anatomy of a Criterion

Supported Checks

fhir-resource-state

hl7-structural

portal-state-match

sftp-file-present

voice-transcript

x12-response

Annotated Examples

FHIR: Appointment booked

Voice: required disclosures

SFTP: claim file uploaded

Portal: prior auth submitted

HL7: ADT sent

X12: 271 eligibility response

Writing Good Criteria

Next Steps

Verification

Criteria API

​How Criteria Work

​Anatomy of a Criterion

​Supported Checks

fhir-resource-state

hl7-structural

portal-state-match

sftp-file-present

voice-transcript

x12-response

​Annotated Examples

​FHIR: Appointment booked

​Voice: required disclosures

​SFTP: claim file uploaded

​Portal: prior auth submitted

​HL7: ADT sent

​X12: 271 eligibility response

​Writing Good Criteria

​Next Steps

Verification

Criteria API

How Criteria Work

Anatomy of a Criterion

Supported Checks

Annotated Examples

FHIR: Appointment booked

Voice: required disclosures

SFTP: claim file uploaded

Portal: prior auth submitted

HL7: ADT sent

X12: 271 eligibility response

Writing Good Criteria

Next Steps