Include runtime LLM output in dataset exports

Applies to:

Plan -
Deployment -

Summary

Dataset rows expose core evaluation fields like input, expected, metadata, and tags. There is no top-level output field. Put the model’s runtime output into a namespaced metadata key on the span you add to the dataset, such as metadata.run_output. For trace-level adds from Logs, put it on the root span.

What is happening

Datasets are designed for stable test cases and expose input, expected, metadata, and tags. The UI action that adds a trace to a dataset maps span fields into dataset row fields, commonly span input -> dataset input and span output -> dataset expected. Metadata from the span you add is copied into dataset.metadata. When you add a trace row from Logs, that is typically the root span, so child-span outputs are not included unless you also copy them onto that span or add the child span/programmatic row explicitly. If you need the model generation that occurred at runtime in the dataset export, it must be present in the added span’s metadata or added when creating the dataset row.

Fix or suggestion

Option 1: Store runtime output in dataset metadata (recommended)

On the span you plan to add to the dataset, write the generation output to a namespaced metadata key, e.g. metadata.run_output. For trace-level adds from Logs, put it on the root span.
If you use traced functions, the return value can populate the span’s output; also log metadata.run_output if you need it in dataset exports.
With OpenTelemetry, set braintrust.output or gen_ai.completion for the span if you want Braintrust to populate span output. Also set braintrust.metadata.run_output for dataset exporting.
Use the UI Add to dataset action. The dataset export will include metadata.run_output.

Minimal examples: TypeScript (OTel SDK):

span.setAttribute("braintrust.output", modelOutput);
span.setAttribute("braintrust.metadata.run_output", modelOutput);

Python (span.log):

span.log(output=model_output, metadata={"run_output": model_output})

Python (OTel SDK):

span.set_attribute("gen_ai.completion", model_output)
span.set_attribute("braintrust.metadata.run_output", model_output)

Notes:

Ensure you write metadata.run_output on the span before clicking Add to dataset.
If the model output lives in a child span, copy it into the root span’s metadata for trace-level adds from Logs, or include span_id / root_span_id if you need linkage.

Option 2: Programmatic insert or export logs and spans

Export Logs or Spans: use logs or spans export if you want raw runtime outputs preserved without touching datasets.
Dataset Insert API: create dataset rows programmatically and include run_output in metadata along with span linkage fields like span_id and root_span_id.

Example cURL (replace placeholders):

curl -X POST https://api.braintrust.dev/v1/dataset/{dataset_id}/insert \
  -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "events": [{
      "input": "original input",
      "expected": "gold label",
      "metadata": {
        "run_output": "model generated text",
        "span_id": "abcd1234",
        "root_span_id": "root5678"
      }
    }]
  }'

How to confirm it worked

Export the dataset JSON and verify each row’s metadata contains run_output with the expected generation.
In the UI, open the dataset row and confirm metadata.run_output appears.
Open the source trace and verify the relevant span shows the output value or metadata key.

Notes

Use expected for the durable ground-truth label.
Use metadata.run_output for the model’s runtime generation to avoid overwriting expected.
When adding a trace row from Logs, the UI captures the root span. If you rely on child-span outputs, copy them into root metadata or use the API/custom mapping.

​Summary

​What is happening

​Fix or suggestion

​Option 1: Store runtime output in dataset metadata (recommended)

​Option 2: Programmatic insert or export logs and spans

​How to confirm it worked

​Notes

Summary

What is happening

Fix or suggestion

Option 1: Store runtime output in dataset metadata (recommended)

Option 2: Programmatic insert or export logs and spans

How to confirm it worked

Notes