[RFC, nit] - create_dataset takes DatasetRecordRaw to avoid needing to specify unused IDs #15911
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Updates create_dataset and create_dataset_from_csv to accept DatasetRecordRaw instead of DatasetRecord in an attempt to address feedback from Jordan Singleton - https://dd.slack.com/archives/C08UB8ZE5KR/p1767794406631329.
Problem + Solution
The records parameter was typed as List[DatasetRecord], which requires a record_id field. However, record_id is not actually required — the SDK auto-generates it and the backend assigns a new one on push. This forced users to provide a dummy value to satisfy the type checker, see append in
The Dataset.append method already accepts DatasetRecordRaw and handles record_id generation internally (see ddtrace/llmobs/_experiment.py line 227):
This should be backwards compatible as it's a narrowing of the existing types, any code that still uses DatasetRecord should keep working.