Two Formats
Datasets come in one of two formats, set via thetype field.
| Format | Storage | Typical use |
|---|---|---|
| FHIR | Bundle JSON stored in the data / config column | Patients, Conditions, Encounters, Appointments, Coverage, Observations, etc. |
| Files | Manifest in config, actual file bytes in GCS under datasets/{datasetId}/files/{ulid}.{ext} | Inbound/outbound fax documents, referral PDFs, SFTP drops, X12 payloads |
How Datasets Attach to Sandboxes
Datasets do not live inside a sandbox directly. They are linked to a sandbox, and the link drives provisioning: When a dataset is linked to a sandbox, Verial creates a child dataset (parentId points back to the original) with copied config and copied GCS files. The sandbox operates on the child, so anything the agent does (adding a Patient, dropping a fax, updating a prior auth record) stays isolated to that sandbox. The original dataset is preserved and can be reused across many runs.
At branch time Verial also writes a baseline checkpoint, a snapshot of the child’s config at that moment. Checkpoints give you a known-good state to reset to.
Creating a Dataset
data directly, or by asking Verial to generate contents from a prompt.
Hand-authored contents
Generated from a prompt
Linking a Dataset to a Sandbox
Datasets are attached to a sandbox via the sandbox API. The link triggers the branch-and-checkpoint flow described above:DELETE /sandboxes/{id}/datasets/{dataset_id}. Inside a benchmark run, datasets are linked automatically based on the environment’s configuration, so you rarely call these directly during a rollout.
Scoping Criteria to Specific Entities
A dataset can contain many entities (patients, referrals, auth requests). A Criterion can scope itself to one of them viainput_entity_id, so the same task can be reused across many dataset rows without rewriting assertions.
Next Steps
Sandboxes
How datasets are loaded into running simulator instances.
Datasets API
REST endpoints and full object reference.