Skip to content

Adding ability to set namespace and project using env variables#1186

Merged
ilongin merged 8 commits intomainfrom
ilongin/1185-get-project-from-env
Jun 28, 2025
Merged

Adding ability to set namespace and project using env variables#1186
ilongin merged 8 commits intomainfrom
ilongin/1185-get-project-from-env

Conversation

@ilongin
Copy link
Contributor

@ilongin ilongin commented Jun 27, 2025

Setting Namespace and Project via Environment Variables

In addition to using .settings(), you can configure the namespace and project using environment variables:

  • DATACHAIN_NAMESPACE sets the namespace.
  • DATACHAIN_PROJECT sets the project name, or both the namespace and project using the format namespace.project.

Examples

# Set namespace only
export DATACHAIN_NAMESPACE=dev
# Set project only
export DATACHAIN_PROJECT=analytics
# Set both namespace and project
export DATACHAIN_PROJECT=dev.analytics

How Namespace and Project Are Resolved

When determining which namespace and project to use, Datachain applies the following precedence:

  1. Fully qualified dataset name
    If the dataset name includes both the namespace and project, these values take highest precedence.
    dc.read_dataset("dev.analytics.metrics")
  2. Explicit settings in code
    Values provided via .settings() or passed directly to read_dataset() or similar methods.
    dc.settings(namespace="dev", project="analytics")
    dc.read_dataset("metrics", namespace="dev", project="analytics")
  3. Environment variables
    Namespace and project set using environment variables:
    export DATACHAIN_PROJECT=dev.analytics

Summary by Sourcery

Enable setting default namespace and project via environment variables and centralize dataset name resolution through Catalog.get_full_dataset_name

New Features:

  • Allow configuring default namespace with DATACHAIN_NAMESPACE environment variable
  • Allow configuring default project with DATACHAIN_PROJECT environment variable

Enhancements:

  • Introduce Catalog.get_full_dataset_name to unify namespace, project, and dataset name resolution
  • Refactor save, read, delete functions and CLI dataset commands to leverage the unified name resolution

Documentation:

  • Add DATACHAIN_NAMESPACE and DATACHAIN_PROJECT entries to the environment variables guide

Tests:

  • Add parameterized test to verify namespace and project resolution via environment variables and settings during save

Summary by Sourcery

Enable environment variable support for namespace and project defaults and centralize dataset name resolution through Catalog.get_full_dataset_name, updating dataset operations, CLI commands, documentation, and tests accordingly

New Features:

  • Allow configuring default namespace via DATACHAIN_NAMESPACE environment variable
  • Allow configuring default project or combined namespace.project via DATACHAIN_PROJECT environment variable

Enhancements:

  • Introduce Catalog.get_full_dataset_name to centralize namespace, project, and dataset name resolution
  • Refactor save, read_dataset, delete_dataset, and CLI commands to use unified name resolution
  • Return datasets by full_name in record insertion to ensure consistent lookup

Documentation:

  • Document DATACHAIN_NAMESPACE and DATACHAIN_PROJECT in environment variable and namespaces guides
  • Update namespaces guide with resolution precedence including environment variables and defaults

Tests:

  • Add parameterized tests for namespace and project resolution via environment variables, settings, and explicit dataset names

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jun 27, 2025

Reviewer's Guide

This PR centralizes dataset name resolution in a new Catalog.get_full_dataset_name method to incorporate DATACHAIN_NAMESPACE and DATACHAIN_PROJECT environment variables and replaces ad-hoc parsing across save, read, delete, and CLI commands, with corresponding updates to documentation and tests.

Sequence diagram for dataset name resolution with environment variables

sequenceDiagram
    participant User
    participant CLI_or_API as CLI/API/Library
    participant Catalog
    participant Metastore
    User->>CLI_or_API: Call save/read/delete_dataset(name, ...)
    CLI_or_API->>Catalog: get_full_dataset_name(name, project_name, namespace_name)
    Catalog->>parse_dataset_name: parse_dataset_name(name)
    Catalog->>Env: Read DATACHAIN_NAMESPACE and DATACHAIN_PROJECT
    Catalog->>Metastore: Get default_namespace_name, default_project_name
    Catalog-->>CLI_or_API: (namespace, project, name)
    CLI_or_API->>Catalog: get_dataset(name, project)
    Catalog->>Metastore: get_project(project_name, namespace_name)
    Metastore-->>Catalog: Project
    Catalog-->>CLI_or_API: DatasetRecord
Loading

Class diagram for Catalog and dataset name resolution

classDiagram
    class Catalog {
        +get_full_dataset_name(name: str, project_name: Optional[str], namespace_name: Optional[str]) tuple[str, str, str]
        +get_dataset(name: str, project: Optional[Project]) DatasetRecord
    }
    Catalog --> Metastore : uses
    class Metastore {
        +default_namespace_name
        +default_project_name
        +get_project(project_name, namespace_name)
        +is_local_dataset(namespace_name)
    }
    class parse_dataset_name {
        <<function>>
    }
    Catalog ..> parse_dataset_name : calls
Loading

File-Level Changes

Change Details Files
Unified dataset name resolution with Catalog.get_full_dataset_name
  • Implemented get_full_dataset_name to parse names, read env vars, and apply precedence
  • Replaced manual parse_dataset_name and fallback logic in save, read, delete, and CLI commands
  • Adjusted read_records to use full dataset name when returning datasets
src/datachain/catalog/catalog.py
src/datachain/lib/dc/datachain.py
src/datachain/lib/dc/datasets.py
src/datachain/cli/commands/datasets.py
src/datachain/lib/dc/records.py
Environment variable support for namespace and project defaults
  • Read DATACHAIN_NAMESPACE and DATACHAIN_PROJECT (including ns.proj syntax) in resolution logic
  • Integrated env var values into fallback order after explicit settings
src/datachain/catalog/catalog.py
Documentation enhancements for env var configuration
  • Added namespace/project env var sections with examples in namespaces guide
  • Documented DATACHAIN_NAMESPACE and DATACHAIN_PROJECT in env reference
docs/guide/namespaces.md
docs/guide/env.md
Parameterized tests for resolution precedence
  • Added test_save_all_ways_to_set_project to cover explicit names, settings, env vars, and defaults
  • Verified correct namespace and project selection across scenarios
tests/unit/lib/test_datachain.py

Possibly linked issues

  • remove docstring from DataModel.__pydantic__init_subclass__ #123: The PR implements setting default namespace and project using DATACHAIN_NAMESPACE and DATACHAIN_PROJECT environment variables and centralizes resolution logic.
  • #0: The PR adds environment variable configuration for default dataset namespace and project, a feature discussed in the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ilongin ilongin marked this pull request as draft June 27, 2025 14:15
@ilongin ilongin linked an issue Jun 27, 2025 that may be closed by this pull request
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ilongin - I've reviewed your changes - here's some feedback:

  • Please add a clear docstring to get_full_dataset_name outlining parameter precedence (parsed name, args, env vars, defaults) and what each returned tuple element represents.
  • Consider extracting direct os.environ lookups into a configuration helper or injecting the env values at initialization to improve testability and separation of concerns.
  • In test_save_all_ways_to_set_project the use_settings parameter is unused; either implement branching on that flag to cover both code paths or remove it to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Please add a clear docstring to get_full_dataset_name outlining parameter precedence (parsed name, args, env vars, defaults) and what each returned tuple element represents.
- Consider extracting direct os.environ lookups into a configuration helper or injecting the env values at initialization to improve testability and separation of concerns.
- In test_save_all_ways_to_set_project the use_settings parameter is unused; either implement branching on that flag to cover both code paths or remove it to avoid confusion.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jun 27, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: ff182b8
Status: ✅  Deploy successful!
Preview URL: https://e2e64851.datachain-documentation.pages.dev
Branch Preview URL: https://ilongin-1185-get-project-fro.datachain-documentation.pages.dev

View logs


### Namespaces and projects
- `DATACHAIN_NAMESPACE` – Namespace name to use as default.
- `DATACHAIN_PROJECT` – Project name to use as default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can project name include namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I'd expect there is a single variable, not two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you would like for example this: DATACHAIN_PROJECT=dev.analytics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but we still need a way to specify only NAMESPACE

any ideas for a good var name?

(I'm fine to keep both vars and make sure that DATACHAIN_PROJECT can accept dev.analytics`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I agree that we need to keep NAMESPACE. I will do it as you propose, project can include namespace and that's it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second example is strange to me. DATACHAIN_PROJECT env clearly states it's about projects but then we set dev namespace with it and no project? I'm more in favor of what Ivan suggested above, examples:

DATACHAIN_PROJECT=dev.analytics -> dev namespace and analytics project
DATACHAIN_PROJECT=analytics  -> analytics project and default namespace
DATACHAIN_NAMESPACE=dev  -> dev namespace and default project
DATACHAIN_NAMESPACE=dev & DATACHAIN_PROJECT=analytics -> dev namespace and analytics project

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. I meant DATACHAIN_NAMESPACE. modifying...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a separate note...

  1. does DATACHAIN_ prefix too long? Why not DC_?
  2. why don't we operate in datachain config instead of env variables. it's more common practice. User might work on several projects with different namespaces (or other env) at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmpetrov

  1. We already use DATACHAIN_ prefix so I wanted to be consistent. We can do alias for all of those variables to DC_ as well if we want, but I would do it in a separate issue
  2. Main motivation for this was to enable Studio users to set default values for namespace / project. Locally user can only have local.local project and cannot create another one so dataset config doesn't really helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re env vs config. It's a general question. It feels like we are too heavy on env while in general config is preferred option.

@codecov
Copy link

codecov bot commented Jun 27, 2025

Codecov Report

Attention: Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.71%. Comparing base (79959e0) to head (ff182b8).

Files with missing lines Patch % Lines
src/datachain/cli/commands/datasets.py 66.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1186      +/-   ##
==========================================
+ Coverage   88.69%   88.71%   +0.01%     
==========================================
  Files         152      152              
  Lines       13531    13535       +4     
  Branches     1875     1879       +4     
==========================================
+ Hits        12001    12007       +6     
+ Misses       1088     1086       -2     
  Partials      442      442              
Flag Coverage Δ
datachain 88.64% <96.77%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/datachain/catalog/catalog.py 85.92% <100.00%> (+0.15%) ⬆️
src/datachain/data_storage/metastore.py 94.34% <100.00%> (+0.06%) ⬆️
src/datachain/data_storage/sqlite.py 85.82% <100.00%> (ø)
src/datachain/dataset.py 86.72% <100.00%> (+0.07%) ⬆️
src/datachain/lib/dc/datachain.py 89.82% <100.00%> (-0.02%) ⬇️
src/datachain/lib/dc/datasets.py 92.10% <100.00%> (-0.49%) ⬇️
src/datachain/lib/dc/records.py 100.00% <100.00%> (ø)
src/datachain/cli/commands/datasets.py 71.08% <66.66%> (-0.03%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilongin ilongin requested review from dmpetrov and shcheklein June 27, 2025 23:49
@ilongin ilongin marked this pull request as ready for review June 28, 2025 00:44
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ilongin - I've reviewed your changes - here's some feedback:

  • Add a detailed docstring to Catalog.get_full_dataset_name that explains its parameters, return values, and the exact resolution precedence (parsed name, settings, env variables, defaults).
  • There’s a duplicate entry in the test_save_all_ways_to_set_project parameter list; deduplicate or clarify that scenario to keep the matrix maintainable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add a detailed docstring to Catalog.get_full_dataset_name that explains its parameters, return values, and the exact resolution precedence (parsed name, settings, env variables, defaults).
- There’s a duplicate entry in the test_save_all_ways_to_set_project parameter list; deduplicate or clarify that scenario to keep the matrix maintainable.

## Individual Comments

### Comment 1
<location> `tests/unit/lib/test_datachain.py:3455` </location>
<code_context>
+        "env_namespace,env_project,"
+        "result_ds_namespace,result_ds_project"
+    ),
+    [
+        ("n3", "p3", "n2", "p2", "n1", "p1", "n3", "p3"),
+        ("", "", "n2", "p2", "n1", "p1", "n2", "p2"),
+        ("", "", "", "", "n1", "p1", "n1", "p1"),
+        ("", "", "", "", "n5", "n1.p1", "n1", "p1"),
+        ("", "", "", "", "", "n1.p1", "n1", "p1"),
+        ("", "", "", "", "", "n1.p1", "n1", "p1"),
+        ("n3", "p3", "n2", "p2", "", "", "n3", "p3"),
+        ("n3", "p3", "", "", "", "", "n3", "p3"),
+        ("n3", "p3", "", "", "n1", "p1", "n3", "p3"),
+        ("", "", "", "", "", "", "", ""),
+    ],
+)
</code_context>

<issue_to_address>
Consider adding test cases for invalid or malformed environment variable values.

Adding tests for malformed or unexpected values (such as extra dots, leading/trailing dots, whitespace, or special characters) will help verify error handling and prevent subtle bugs.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    [
        ("n3", "p3", "n2", "p2", "n1", "p1", "n3", "p3"),
        ("", "", "n2", "p2", "n1", "p1", "n2", "p2"),
        ("", "", "", "", "n1", "p1", "n1", "p1"),
        ("", "", "", "", "n5", "n1.p1", "n1", "p1"),
        ("", "", "", "", "", "n1.p1", "n1", "p1"),
        ("", "", "", "", "", "n1.p1", "n1", "p1"),
        ("n3", "p3", "n2", "p2", "", "", "n3", "p3"),
        ("n3", "p3", "", "", "", "", "n3", "p3"),
        ("n3", "p3", "", "", "n1", "p1", "n3", "p3"),
        ("", "", "", "", "", "", "", ""),
    ],
)
=======
    [
        ("n3", "p3", "n2", "p2", "n1", "p1", "n3", "p3"),
        ("", "", "n2", "p2", "n1", "p1", "n2", "p2"),
        ("", "", "", "", "n1", "p1", "n1", "p1"),
        ("", "", "", "", "n5", "n1.p1", "n1", "p1"),
        ("", "", "", "", "", "n1.p1", "n1", "p1"),
        ("", "", "", "", "", "n1.p1", "n1", "p1"),
        ("n3", "p3", "n2", "p2", "", "", "n3", "p3"),
        ("n3", "p3", "", "", "", "", "n3", "p3"),
        ("n3", "p3", "", "", "n1", "p1", "n3", "p3"),
        ("", "", "", "", "", "", "", ""),
        # Malformed/invalid env var values
        ("", "", "", "", " n1 ", " p1 ", "n1", "p1"),  # leading/trailing whitespace
        ("", "", "", "", ".n1", "p1.", "n1", "p1"),    # leading/trailing dots
        ("", "", "", "", "n1..n2", "p1..p2", "n1..n2", "p1..p2"),  # extra dots
        ("", "", "", "", "n1$", "p1#", "n1$", "p1#"),  # special characters
        ("", "", "", "", "n1.p1", "p1.n1", "n1", "p1"), # swapped/invalid format
        ("", "", "", "", "n1..", "..p1", "n1..", "..p1"), # trailing/leading double dots
    ],
)
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `docs/guide/namespaces.md:161` </location>
<code_context>
 ds = dc.read_dataset("local.local.metrics")
-ds.show()
-```
+ds.sho
</code_context>

<issue_to_address>
Typo: 'ds.sho' should be 'ds.show()'.

Please correct 'ds.sho' to 'ds.show()' for consistency and to ensure the example works as intended.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 3455 to 3467
[
("n3", "p3", "n2", "p2", "n1", "p1", "n3", "p3"),
("", "", "n2", "p2", "n1", "p1", "n2", "p2"),
("", "", "", "", "n1", "p1", "n1", "p1"),
("", "", "", "", "n5", "n1.p1", "n1", "p1"),
("", "", "", "", "", "n1.p1", "n1", "p1"),
("", "", "", "", "", "n1.p1", "n1", "p1"),
("n3", "p3", "n2", "p2", "", "", "n3", "p3"),
("n3", "p3", "", "", "", "", "n3", "p3"),
("n3", "p3", "", "", "n1", "p1", "n3", "p3"),
("", "", "", "", "", "", "", ""),
],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding test cases for invalid or malformed environment variable values.

Adding tests for malformed or unexpected values (such as extra dots, leading/trailing dots, whitespace, or special characters) will help verify error handling and prevent subtle bugs.

Suggested change
[
("n3", "p3", "n2", "p2", "n1", "p1", "n3", "p3"),
("", "", "n2", "p2", "n1", "p1", "n2", "p2"),
("", "", "", "", "n1", "p1", "n1", "p1"),
("", "", "", "", "n5", "n1.p1", "n1", "p1"),
("", "", "", "", "", "n1.p1", "n1", "p1"),
("", "", "", "", "", "n1.p1", "n1", "p1"),
("n3", "p3", "n2", "p2", "", "", "n3", "p3"),
("n3", "p3", "", "", "", "", "n3", "p3"),
("n3", "p3", "", "", "n1", "p1", "n3", "p3"),
("", "", "", "", "", "", "", ""),
],
)
[
("n3", "p3", "n2", "p2", "n1", "p1", "n3", "p3"),
("", "", "n2", "p2", "n1", "p1", "n2", "p2"),
("", "", "", "", "n1", "p1", "n1", "p1"),
("", "", "", "", "n5", "n1.p1", "n1", "p1"),
("", "", "", "", "", "n1.p1", "n1", "p1"),
("", "", "", "", "", "n1.p1", "n1", "p1"),
("n3", "p3", "n2", "p2", "", "", "n3", "p3"),
("n3", "p3", "", "", "", "", "n3", "p3"),
("n3", "p3", "", "", "n1", "p1", "n3", "p3"),
("", "", "", "", "", "", "", ""),
# Malformed/invalid env var values
("", "", "", "", " n1 ", " p1 ", "n1", "p1"), # leading/trailing whitespace
("", "", "", "", ".n1", "p1.", "n1", "p1"), # leading/trailing dots
("", "", "", "", "n1..n2", "p1..p2", "n1..n2", "p1..p2"), # extra dots
("", "", "", "", "n1$", "p1#", "n1$", "p1#"), # special characters
("", "", "", "", "n1.p1", "p1.n1", "n1", "p1"), # swapped/invalid format
("", "", "", "", "n1..", "..p1", "n1..", "..p1"), # trailing/leading double dots
],
)

ds = dc.read_dataset("local.local.metrics")
ds.show()
```
ds.sho
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Typo: 'ds.sho' should be 'ds.show()'.

Please correct 'ds.sho' to 'ds.show()' for consistency and to ensure the example works as intended.

Comment on lines +3481 to +3482
if namespace and project:
return f"{namespace}.{project}.{name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +3494 to +3497
if not result_ds_namespace and not result_ds_project:
# special case when nothing is defined - we set default ones
result_ds_namespace = metastore.default_namespace_name
result_ds_project = metastore.default_project_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines +3481 to +3483
if namespace and project:
return f"{namespace}.{project}.{name}"
return name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
if namespace and project:
return f"{namespace}.{project}.{name}"
return name
return f"{namespace}.{project}.{name}" if namespace and project else name

@dmpetrov
Copy link
Contributor

@ilongin it feels like we introduce 2 variables to public API without a need.

We can handle this using a single namespace params. Let's try this instead.

@shcheklein
Copy link
Contributor

@dmpetrov I think we need then a good idea on the name. If you have a good suggestion please share. (keep in mind please - we need to release it today).

@ilongin ilongin merged commit 1692ee3 into main Jun 28, 2025
35 of 41 checks passed
@ilongin ilongin deleted the ilongin/1185-get-project-from-env branch June 28, 2025 22:37
@dmpetrov
Copy link
Contributor

dmpetrov commented Jun 29, 2025

I think we need then a good idea on the name.

The idea:

DATACHAIN_NAMESPACE=ns1
# or
DATACHAIN_NAMESPACE=ns1.pr1

I cannot imaging cases when user needs to set only project. Also, we should avoid using such vague names asa project especially because we already have project in experiments.

I'm pretty sure we won't find additional space in UI to see two params - it will be a single box for both. So, why we push them on users in the API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get namespace and project names from env variables

3 participants