Datasets

A dataset is the synthetic data that populates a Sandbox when a Playground is provisioned. FHIR patient panels, SFTP file manifests, payer rosters: anything an Environment needs in place before an agent can drive a rollout. You either author dataset contents by hand or ask Verial to generate them from a prompt.

Two Formats

Datasets come in one of two formats, set via the type field.

Format	Storage	Typical use
FHIR	Bundle JSON stored in the `data` / `config` column	Patients, Conditions, Encounters, Appointments, Coverage, Observations, etc.
Files	Manifest in `config`, actual file bytes in GCS under `datasets/{datasetId}/files/{ulid}.{ext}`	Inbound/outbound fax documents, referral PDFs, SFTP drops, X12 payloads

FHIR datasets are synthesized into a FHIR R4 Bundle at provisioning time and loaded into the sandbox’s FHIR store. Files datasets are copied into the sandbox’s own object storage so the agent can read from and write to the SFTP or fax endpoints.

How Datasets Attach to Sandboxes

Datasets do not live inside a sandbox directly. They are linked to a sandbox, and the link drives provisioning: When a dataset is linked to a sandbox, Verial creates a child dataset (parentId points back to the original) with copied config and copied GCS files. The sandbox operates on the child, so anything the agent does (adding a Patient, dropping a fax, updating a prior auth record) stays isolated to that sandbox. The original dataset is preserved and can be reused across many runs. At branch time Verial also writes a baseline checkpoint, a snapshot of the child’s config at that moment. Checkpoints give you a known-good state to reset to.

Creating a Dataset

const dataset = await verial.datasets.create({
  name: 'Diabetes patient panel',
  description: '50 patients with Type 2 diabetes and pending auths',
})

At this point the dataset is a container. Populate it by setting data directly, or by asking Verial to generate contents from a prompt.

Hand-authored contents

await verial.datasets.update({
  id: dataset.id,
  data: {
    resourceType: 'Bundle',
    type: 'collection',
    entry: [
      { resource: { resourceType: 'Patient', id: 'pt-1', name: [{ family: 'Smith', given: ['John'] }] } },
      // ...
    ],
  },
})

Generated from a prompt

await verial.datasets.generate({
  id: dataset.id,
  prompt: '10 patients with pending prior auth requests for lumbar MRI',
})

Linking a Dataset to a Sandbox

Datasets are attached to a sandbox via the sandbox API. The link triggers the branch-and-checkpoint flow described above:

curl -X POST "https://api.verial.ai/sandboxes/$SANDBOX_ID/datasets/$DATASET_ID" \
  -H "Authorization: Bearer $VERIAL_API_KEY"

You can remove the link later with DELETE /sandboxes/{id}/datasets/{dataset_id}. Inside a benchmark run, datasets are linked automatically based on the environment’s configuration, so you rarely call these directly during a rollout.

Scoping Criteria to Specific Entities

A dataset can contain many entities (patients, referrals, auth requests). A Criterion can scope itself to one of them via input_entity_id, so the same task can be reused across many dataset rows without rewriting assertions.

Two Formats

How Datasets Attach to Sandboxes

Creating a Dataset

Hand-authored contents

Generated from a prompt

Linking a Dataset to a Sandbox

Scoping Criteria to Specific Entities

Next Steps

Sandboxes

Datasets API

​Two Formats

​How Datasets Attach to Sandboxes

​Creating a Dataset

​Hand-authored contents

​Generated from a prompt

​Linking a Dataset to a Sandbox

​Scoping Criteria to Specific Entities

​Next Steps

Sandboxes

Datasets API

Two Formats

How Datasets Attach to Sandboxes

Creating a Dataset

Hand-authored contents

Generated from a prompt

Linking a Dataset to a Sandbox

Scoping Criteria to Specific Entities

Next Steps