Skip to content

Conversation

@mbldatadog
Copy link

@mbldatadog mbldatadog commented Jan 7, 2026

Description

Updates create_dataset and create_dataset_from_csv to accept DatasetRecordRaw instead of DatasetRecord in an attempt to address feedback from Jordan Singleton - https://dd.slack.com/archives/C08UB8ZE5KR/p1767794406631329.

Problem + Solution

The records parameter was typed as List[DatasetRecord], which requires a record_id field. However, record_id is not actually required — the SDK auto-generates it and the backend assigns a new one on push. This forced users to provide a dummy value to satisfy the type checker, see append in

The Dataset.append method already accepts DatasetRecordRaw and handles record_id generation internally (see ddtrace/llmobs/_experiment.py line 227):

def append(self, record: DatasetRecordRaw) -> None:
    record_id: str = uuid.uuid4().hex
    # this record ID will be discarded after push, BE will generate a new one, this is just
    # for tracking new records locally before the push
    r: DatasetRecord = {**record, "record_id": record_id}

This should be backwards compatible as it's a narrowing of the existing types, any code that still uses DatasetRecord should keep working.

…o specify unused IDs

Description
Updates create_dataset and create_dataset_from_csv to accept DatasetRecordRaw instead of DatasetRecord in an attempt to address feedback from Jordan Singleton - https://dd.slack.com/archives/C08UB8ZE5KR/p1767794406631329.

Problem + Solution
The records parameter was typed as List[DatasetRecord], which requires a record_id field. However, record_id is not actually required — the SDK auto-generates it and the backend assigns a new one on push. This forced users to provide a dummy value to satisfy the type checker, see append in

The Dataset.append method already accepts DatasetRecordRaw and handles record_id generation internally (see ddtrace/llmobs/_experiment.py line 227):
def append(self, record: DatasetRecordRaw) -> None:
    record_id: str = uuid.uuid4().hex
    # this record ID will be discarded after push, BE will generate a new one, this is just
    # for tracking new records locally before the push
    r: DatasetRecord = {**record, "record_id": record_id}

This *should* be backwards compatible as it's a narrowing of the existing types, any code that still uses DatasetRecord should keep working.
@mbldatadog mbldatadog requested a review from a team as a code owner January 7, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant