Skip to main content
A dataset is the synthetic data that populates a Sandbox when a Playground is provisioned. FHIR patient panels, SFTP file manifests, payer rosters: anything an Environment needs in place before an agent can drive a rollout. You either author dataset contents by hand or ask Verial to generate them from a prompt.

Two Formats

Datasets come in one of two formats, set via the type field.
FormatStorageTypical use
FHIRBundle JSON stored in the data / config columnPatients, Conditions, Encounters, Appointments, Coverage, Observations, etc.
FilesManifest in config, actual file bytes in GCS under datasets/{datasetId}/files/{ulid}.{ext}Inbound/outbound fax documents, referral PDFs, SFTP drops, X12 payloads
FHIR datasets are synthesized into a FHIR R4 Bundle at provisioning time and loaded into the sandbox’s FHIR store. Files datasets are copied into the sandbox’s own object storage so the agent can read from and write to the SFTP or fax endpoints.

How Datasets Attach to Sandboxes

Datasets do not live inside a sandbox directly. They are linked to a sandbox, and the link drives provisioning: When a dataset is linked to a sandbox, Verial creates a child dataset (parentId points back to the original) with copied config and copied GCS files. The sandbox operates on the child, so anything the agent does (adding a Patient, dropping a fax, updating a prior auth record) stays isolated to that sandbox. The original dataset is preserved and can be reused across many runs. At branch time Verial also writes a baseline checkpoint, a snapshot of the child’s config at that moment. Checkpoints give you a known-good state to reset to.

Creating a Dataset

const dataset = await verial.datasets.create({
  name: 'Diabetes patient panel',
  description: '50 patients with Type 2 diabetes and pending auths',
})
At this point the dataset is a container. Populate it by setting data directly, or by asking Verial to generate contents from a prompt.

Hand-authored contents

await verial.datasets.update({
  id: dataset.id,
  data: {
    resourceType: 'Bundle',
    type: 'collection',
    entry: [
      { resource: { resourceType: 'Patient', id: 'pt-1', name: [{ family: 'Smith', given: ['John'] }] } },
      // ...
    ],
  },
})

Generated from a prompt

await verial.datasets.generate({
  id: dataset.id,
  prompt: '10 patients with pending prior auth requests for lumbar MRI',
})

Linking a Dataset to a Sandbox

Datasets are attached to a sandbox via the sandbox API. The link triggers the branch-and-checkpoint flow described above:
curl -X POST "https://api.verial.ai/sandboxes/$SANDBOX_ID/datasets/$DATASET_ID" \
  -H "Authorization: Bearer $VERIAL_API_KEY"
You can remove the link later with DELETE /sandboxes/{id}/datasets/{dataset_id}. Inside a benchmark run, datasets are linked automatically based on the environment’s configuration, so you rarely call these directly during a rollout.

Scoping Criteria to Specific Entities

A dataset can contain many entities (patients, referrals, auth requests). A Criterion can scope itself to one of them via input_entity_id, so the same task can be reused across many dataset rows without rewriting assertions.

Next Steps

Sandboxes

How datasets are loaded into running simulator instances.

Datasets API

REST endpoints and full object reference.