[RFC, nit] - create_dataset takes DatasetRecordRaw to avoid needing to specify unused IDs #15911

mbldatadog · 2026-01-07T21:12:37Z

Description

Updates create_dataset and create_dataset_from_csv to accept DatasetRecordRaw instead of DatasetRecord in an attempt to address feedback from Jordan Singleton - https://dd.slack.com/archives/C08UB8ZE5KR/p1767794406631329.

Problem + Solution

The records parameter was typed as List[DatasetRecord], which requires a record_id field. However, record_id is not actually required — the SDK auto-generates it and the backend assigns a new one on push. This forced users to provide a dummy value to satisfy the type checker, see append in

The Dataset.append method already accepts DatasetRecordRaw and handles record_id generation internally (see ddtrace/llmobs/_experiment.py line 227):

def append(self, record: DatasetRecordRaw) -> None:
    record_id: str = uuid.uuid4().hex
    # this record ID will be discarded after push, BE will generate a new one, this is just
    # for tracking new records locally before the push
    r: DatasetRecord = {**record, "record_id": record_id}

This should be backwards compatible as it's a narrowing of the existing types, any code that still uses DatasetRecord should keep working.

…o specify unused IDs Description Updates create_dataset and create_dataset_from_csv to accept DatasetRecordRaw instead of DatasetRecord in an attempt to address feedback from Jordan Singleton - https://dd.slack.com/archives/C08UB8ZE5KR/p1767794406631329. Problem + Solution The records parameter was typed as List[DatasetRecord], which requires a record_id field. However, record_id is not actually required — the SDK auto-generates it and the backend assigns a new one on push. This forced users to provide a dummy value to satisfy the type checker, see append in The Dataset.append method already accepts DatasetRecordRaw and handles record_id generation internally (see ddtrace/llmobs/_experiment.py line 227): def append(self, record: DatasetRecordRaw) -> None: record_id: str = uuid.uuid4().hex # this record ID will be discarded after push, BE will generate a new one, this is just # for tracking new records locally before the push r: DatasetRecord = {**record, "record_id": record_id} This *should* be backwards compatible as it's a narrowing of the existing types, any code that still uses DatasetRecord should keep working.

mbldatadog requested a review from a team as a code owner January 7, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC, nit] - create_dataset takes DatasetRecordRaw to avoid needing to specify unused IDs #15911

[RFC, nit] - create_dataset takes DatasetRecordRaw to avoid needing to specify unused IDs #15911

Uh oh!

mbldatadog commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[RFC, nit] - create_dataset takes DatasetRecordRaw to avoid needing to specify unused IDs #15911

Are you sure you want to change the base?

[RFC, nit] - create_dataset takes DatasetRecordRaw to avoid needing to specify unused IDs #15911

Uh oh!

Conversation

mbldatadog commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem + Solution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mbldatadog commented Jan 7, 2026 •

edited

Loading