Skip to content

Conversation

@shcheklein
Copy link
Contributor

@shcheklein shcheklein commented Jul 9, 2025

Fixes issues with pulling Studio datasets:

import datachain as dc

dataset = dc.read_dataset("@shcheklein.default.test_dataset", update=True)

Triggering:

Traceback (most recent call last):
  File "/Users/ivan/Projects/datachain/test.py", line 4, in <module>
    dataset = dc.read_dataset("@eldada-brainspace.default.parquet-processed-ecg", update=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/datachain/src/datachain/lib/dc/datasets.py", line 186, in read_dataset
    query = DatasetQuery(
            ^^^^^^^^^^^^^
  File "/Users/ivan/Projects/datachain/src/datachain/query/dataset.py", line 1142, in __init__
    self.catalog.get_dataset_with_remote_fallback(
  File "/Users/ivan/Projects/datachain/src/datachain/catalog/catalog.py", line 1153, in get_dataset_with_remote_fallback
    self.pull_dataset(
  File "/Users/ivan/Projects/datachain/src/datachain/catalog/catalog.py", line 1554, in pull_dataset
    namespace = self.metastore.create_namespace(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/datachain/src/datachain/data_storage/metastore.py", line 734, in create_namespace
    Namespace.validate_name(name)
  File "/Users/ivan/Projects/datachain/src/datachain/namespace.py", line 28, in validate_name
    raise InvalidNamespaceNameError(
datachain.error.InvalidNamespaceNameError: Character @ is reserved and not allowed in namespace name

TODO

  • Add tests as a followup

Summary by Sourcery

Skip local namespace and project name validation when pulling remote Studio datasets to allow reserved characters in names

Bug Fixes:

  • Pass validate=False to create_namespace to bypass local namespace name checks
  • Pass validate=False to create_project to bypass local project name checks

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jul 9, 2025

Reviewer's Guide

When pulling datasets from Studio, this change stops local validation of namespace and project names by passing validate=False to the metastore creation calls, allowing reserved characters (e.g. “@”) in names.

Sequence diagram for dataset pull with Studio names (no local validation)

sequenceDiagram
    participant User as actor User
    participant Catalog
    participant Metastore
    User->>Catalog: read_dataset(ds_uri with @)
    Catalog->>Metastore: create_namespace(name, ..., validate=False)
    Metastore-->>Catalog: Namespace created (no validation)
    Catalog->>Metastore: create_project(namespace, name, ..., validate=False)
    Metastore-->>Catalog: Project created (no validation)
    Catalog-->>User: Dataset ready
Loading

Class diagram for updated metastore creation calls

classDiagram
    class Metastore {
        +create_namespace(name, description, uuid, validate=True)
        +create_project(namespace_name, project_name, description, uuid, validate=True)
    }
    class Catalog {
        +pull_dataset(...)
    }
    Catalog --> Metastore : uses
    Metastore : +create_namespace(..., validate=False)
    Metastore : +create_project(..., validate=False)
Loading

File-Level Changes

Change Details Files
Disable local name validation for Studio-provided namespaces and projects
  • Add validate=False to create_namespace calls
  • Add validate=False to create_project calls
src/datachain/catalog/catalog.py

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@shcheklein shcheklein closed this Jul 9, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shcheklein - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant