fix(delta): read qualified dataset name on no changes in delta #1326

shcheklein · 2025-09-09T01:56:31Z

Fixes the scenario in delta case:

No changes / second run
Output dataset name exists in the default namespace (as well as in the target namespace)
Delta silently reading the dataset from de the default namespace instead of the proper one (in the target namespace)

Quite serious issue since it can lead to usage of some wrong data

Summary by Sourcery

Ensure delta save with no changes returns the correct dataset version from the intended namespace and project instead of defaulting to the default namespace

Bug Fixes:

Pass namespace and project parameters to read_dataset when handling no-change delta saves to avoid reading from the default namespace

Enhancements:

Update dependency retrieval to include project and namespace context and generate properly qualified dataset names

Tests:

Introduce helper _get_short_ds_name and update existing tests to use fully qualified names
Add test to verify no-change delta behavior across multiple namespace and project combinations

sourcery-ai · 2025-09-09T01:56:42Z

Reviewer's Guide

This PR fixes delta datasets incorrectly reading from the default namespace when there are no changes by qualifying dataset names with project and namespace in both test and production code, and adds comprehensive tests to validate this behavior across various namespace/project combinations.

Sequence diagram for reading a dataset with qualified namespace and project

sequenceDiagram
participant Caller
participant "read_dataset()"
Caller->>"read_dataset()": read_dataset(name, namespace, project, ...)
"read_dataset()"->>"Target Namespace": Lookup dataset in specified namespace/project
"read_dataset()"-->>Caller: Return dataset from correct namespace

File-Level Changes

Change	Details	Files
Introduce dataset name qualification in tests	Added `_get_short_ds_name` helper to format names based on default/target namespace/project Updated `_get_dependencies` to use the helper and pass project_name/namespace_name to dependency lookup	`tests/func/test_delta.py`
Extend delta tests for no-change scenarios	Changed existing tests to use qualified dataset names Added `test_delta_returns_correct_dataset_on_no_changes` covering default and custom namespaces	`tests/func/test_delta.py`
Ensure correct namespace/project is used in save when no delta changes	Updated `save` method to pass `namespace` and `project` to `read_dataset` on no-change path	`src/datachain/lib/dc/datachain.py`

Possibly linked issues

Initial DataChain Commit #1: PR fixes delta updates by ensuring correct dataset is read from specified namespace/project when no changes occur.
#0: The PR fixes a bug where delta incorrectly resolves dataset names to the default namespace instead of the intended qualified one, directly addressing a core problem the issue aims to prevent.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider using pytest.mark.parametrize for the three dataset‐name/namespace cases in test_delta_returns_correct_dataset_on_no_changes to reduce manual loops and improve readability.
The custom _get_short_ds_name helper duplicates naming logic—see if you can reuse or wrap an existing catalog/metastore method to keep dataset qualification consistent and avoid subtle drift.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Consider using pytest.mark.parametrize for the three dataset‐name/namespace cases in test_delta_returns_correct_dataset_on_no_changes to reduce manual loops and improve readability.
- The custom _get_short_ds_name helper duplicates naming logic—see if you can reuse or wrap an existing catalog/metastore method to keep dataset qualification consistent and avoid subtle drift.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

tests/func/test_delta.py

codecov · 2025-09-09T02:05:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.84%. Comparing base (91617c0) to head (cc2a36f).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1326   +/-   ##
=======================================
  Coverage   88.84%   88.84%           
=======================================
  Files         155      155           
  Lines       14240    14240           
  Branches     2025     2025           
=======================================
  Hits        12652    12652           
  Misses       1124     1124           
  Partials      464      464

Flag	Coverage Δ
datachain	`88.78% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/datachain/lib/dc/datachain.py	`91.14% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR fixes a critical bug in the delta save functionality where the system would incorrectly read from the default namespace instead of the target namespace when there are no changes on subsequent runs. This could lead to using wrong data when datasets with the same name exist in multiple namespaces.

Key Changes

Fix delta save to properly pass namespace and project parameters when reading existing datasets on no-change scenarios
Update test utilities to handle fully qualified dataset names and namespace/project context
Add comprehensive test coverage for the no-change delta behavior across different namespace and project combinations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/datachain/lib/dc/datachain.py	Fixes the core bug by passing namespace and project parameters to read_dataset
tests/func/test_delta.py	Adds helper functions and comprehensive tests to verify the fix works across multiple namespace scenarios

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

dreadatour

Looks good to me 👍

dreadatour · 2025-09-10T02:58:50Z

tests/func/test_delta.py

 from datachain.lib.file import File, ImageFile


+def _get_short_ds_name(catalog, name, project_name, namespace_name) -> str:


Why _get_short_ds_name? IMO it should be something like _get_full_ds_name 🤔

it is trying to get the shortest name (check if namespace passed is default and project is default and doesn't use them)

fix(delta): read qualified dataset name on no changes in delta

cc2a36f

shcheklein requested review from a team and ilongin September 9, 2025 01:56

shcheklein self-assigned this Sep 9, 2025

shcheklein added the bug Something isn't working label Sep 9, 2025

sourcery-ai bot reviewed Sep 9, 2025

View reviewed changes

tests/func/test_delta.py Show resolved Hide resolved

tests/func/test_delta.py Show resolved Hide resolved

shcheklein requested a review from Copilot September 9, 2025 02:26

Copilot AI reviewed Sep 9, 2025

View reviewed changes

dreadatour approved these changes Sep 10, 2025

View reviewed changes

shcheklein merged commit 8355e28 into main Sep 10, 2025
61 of 63 checks passed

shcheklein deleted the fix-delta-no-changes-read branch September 10, 2025 03:07

shcheklein mentioned this pull request Sep 10, 2025

Fix delta updates with non default namespace and project #1193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(delta): read qualified dataset name on no changes in delta #1326

fix(delta): read qualified dataset name on no changes in delta #1326

shcheklein commented Sep 9, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Sep 9, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

dreadatour left a comment

Uh oh!

dreadatour Sep 10, 2025

Uh oh!

shcheklein Sep 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from datachain.lib.file import File, ImageFile


		def _get_short_ds_name(catalog, name, project_name, namespace_name) -> str:

fix(delta): read qualified dataset name on no changes in delta #1326

fix(delta): read qualified dataset name on no changes in delta #1326

Conversation

shcheklein commented Sep 9, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for reading a dataset with qualified namespace and project

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

dreadatour left a comment

Choose a reason for hiding this comment

Uh oh!

dreadatour Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

shcheklein Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shcheklein commented Sep 9, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Sep 9, 2025 •

edited

Loading

codecov bot commented Sep 9, 2025 •

edited

Loading