Anatomy of a Task
| Field | Type | Description |
|---|---|---|
name | string | Short human-readable title |
task_item | object | null | Structured payload the agent receives: instruction, trigger, expected inputs |
scenario | object | null | Optional pre-rollout steps run by the scenario runner before the agent starts |
entities | DatasetEntity[] | Bindings that scope this task to specific synthetic records (e.g. “the patient with DOB 1965-03-15”) |
tags | string[] | Free-form labels for filtering and reporting |
timeout | number | null | Optional per-task timeout override in seconds |
criteria | Criterion[] | Typed assertions scored after the rollout |
task_item
The task_item object is what the agent receives at the start of the task run. It is intentionally loose. Common fields:
instruction: the natural language direction for the agent.trigger: what starts the work (for example “an inbound referral fax”).expected_inputs: optional hints about what data the agent needs to pull from the sandboxes.
scenario
A scenario is a short program run by the scenario runner before the rollout starts. Typical scenarios seed inbound events the agent is meant to react to: dropping a fax into the SFTP inbox, posting an HL7 ORU message, or leaving a voicemail on the IVR line. Starting a task run executes its scenario, then hands control to the agent.
entities
Entities bind a task to specific rows inside the linked dataset. The binding flows through to criteria: a criterion’s input_entity_id can reference one of the task’s entities so the assertion runs against the right record. This is how one task template can be reused across many patients without rewriting assertions.
Where Tasks Fit
Each task produces one task run inside a benchmark run. The task run carries a frozen snapshot of the task at the moment the benchmark was published, so reruns are reproducible even if the task is later edited.Multi-Interface Tasks
A single task can touch several simulators in one rollout. For example a prior-auth task might:- Read the patient’s chart from the FHIR sandbox.
- Call the payer’s IVR line on the Voice sandbox.
- Submit the auth form on the Payer portal sandbox.
- Fax supporting documentation via the Fax sandbox.
Creating a Task
Next Steps
Criteria
The typed assertions that score each task run.
Tasks API
REST endpoints and full object reference.