Skip to main content
An interaction is the evidence a Sandbox records while the agent drives a rollout. Every request to a FHIR store, every HL7 outbound, every portal form submit, every voice turn, every uploaded file, every X12 response: Verial writes it down. The verification engine reads these interactions when it runs each Criterion after the task run completes.

Evidence by Simulator

Each simulator type produces its own evidence shape. The verification engine dispatches to a check implementation keyed by assertion.assert, and each check pulls evidence from the matching source.
SimulatorEvidence
FHIRHTTP request/response log: method, path, request body, status code, response body. Verification runs FHIR searches against the final store state.
HL7Outbound HL7v2 messages recorded as hl7_outbound sandbox events, with the full message payload (MSH, PID, PV1, OBX, etc.).
VoiceCall turns with speaker (agent / caller) and transcribed text, plus the full recording.
FaxInbound or outbound fax document, with OCR text extracted for assertion.
PortalSandbox events per action: form submits, patient searches, auth submissions, with the submitted payload and the resulting state row.
Files / SFTPUploaded file metadata (path, size) plus the raw content in object storage.
X12Submitted and response records per transaction (270/271/276/277/278).
CDS HooksHook invocations and the cards returned by the agent.
MessageOutbound SMS/text messages with the rendered body.

How the Verification Engine Reads Interactions

For each criterion on the task, the engine:
  1. Reads assertion.assert to pick a check implementation.
  2. Pulls the relevant evidence from the sandbox (a FHIR search against the store, HL7 outbound rows, portal state rows, voice turns, SFTP objects, X12 responses).
  3. Runs the typed assertion against that evidence.
  4. Writes a Criterion Run with passed, score, details, and the evidence it considered.
See Verification for the full dispatch table and scoring rules.

Reading Interactions

Interactions surface in two places:
  • Per sandbox: GET /sandboxes/{id}/events returns the raw event log for a sandbox. Useful for debugging a rollout or authoring new criteria from real traces.
  • Per criterion run: GET /criterion-runs/{id} returns the specific evidence the check considered for that criterion, with field-level diffs where applicable.
During a scored benchmark run, per-field evidence is omitted from completion responses so the agent cannot learn the rubric. Fetch the full evidence later from GET /criterion-runs/{id}.

Retention

Interactions persist after a playground is torn down. Teardown releases the live resources (phone numbers, FHIR stores, portal users) but keeps every recorded event so you can review evidence, debug failed criteria, and compare rollouts across benchmark runs.

Next Steps

Verification

How interactions feed the scoring engine.

Sandboxes API

Read the raw event log for any sandbox.