Skip to content

Add configurable seed rendering behavior#2755

Merged
pankajkoti merged 7 commits into
mainfrom
pankajkoti/boss-240-feature-improve-how-cosmos-renders-runs-seeds
Jun 5, 2026
Merged

Add configurable seed rendering behavior#2755
pankajkoti merged 7 commits into
mainfrom
pankajkoti/boss-240-feature-improve-how-cosmos-renders-runs-seeds

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

Description

Adds a seed_rendering_behavior option to RenderConfig, giving control over how Cosmos renders and runs dbt seeds (analogous to source_rendering_behavior):

  • ALWAYS (default): render the seed and run dbt seed on every execution (the original Cosmos behaviour).
  • WHEN_SEED_CHANGES: render the seed, but only run dbt seed when its CSV content has changed since the last successful run. The checksum is read from dbt's manifest (falling back to hashing the CSV) and the last-seen value is persisted as an Airflow Variable scoped per DbtDag/DbtTaskGroup and seed. Supported for ExecutionMode.LOCAL, VIRTUALENV and AIRFLOW_ASYNC, and incompatible with TestBehavior.BUILD; both raise CosmosValueError.
  • RENDER_ONLY: render the seed as a no-op EmptyOperator placeholder.
  • NONE: do not render the seed at all.

The gate runs before the TestBehavior.BUILD branch so NONE/RENDER_ONLY are honoured under every test behaviour. On an unchanged run the seed task succeeds without running dbt seed, without emitting its dataset, and without skip-propagating downstream. Change detection delegates to AbstractDbtBase.execute(), preserving extra_context, debug-mode tracking and **kwargs.

Under ExecutionMode.WATCHER a single dbt build runs all seeds regardless of this setting; only ALWAYS is meaningful there.

Fresh implementation building on the approach explored in #2246 by @tuantran0910.

Related Issue(s)

Closes #1576

🤖 Generated with Claude Code

Add a `seed_rendering_behavior` option to `RenderConfig` that controls how
Cosmos renders and runs dbt seeds, mirroring `source_rendering_behavior`:

- ALWAYS (default): render the seed and run `dbt seed` on every execution.
- WHEN_SEED_CHANGES: render the seed, but only run `dbt seed` when the seed's
  CSV content has changed since the last successful run. The checksum is read
  from dbt's manifest (falling back to hashing the CSV) and the last-seen value
  is persisted as an Airflow Variable scoped per DbtDag/TaskGroup and seed.
  Supported for ExecutionMode.LOCAL, VIRTUALENV and AIRFLOW_ASYNC, and
  incompatible with TestBehavior.BUILD; both raise CosmosValueError.
- RENDER_ONLY: render the seed as a no-op EmptyOperator placeholder.
- NONE: do not render the seed at all.

The gate runs before the TestBehavior.BUILD branch so NONE and RENDER_ONLY are
honored under every test behavior. On an unchanged run the seed task succeeds
without running `dbt seed`, without emitting its dataset, and without
skip-propagating to downstream tasks.
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.37%. Comparing base (8ea12fa) to head (b6e818d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2755      +/-   ##
==========================================
+ Coverage   98.36%   98.37%   +0.01%     
==========================================
  Files         107      107              
  Lines        7942     8021      +79     
==========================================
+ Hits         7812     7891      +79     
  Misses        130      130              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pankajkoti pankajkoti marked this pull request as ready for review June 3, 2026 15:23
@pankajkoti pankajkoti requested a review from jbandoro as a code owner June 3, 2026 15:23
Copilot AI review requested due to automatic review settings June 3, 2026 15:23
@pankajkoti pankajkoti requested review from a team, corsettigyg and dwreeves as code owners June 3, 2026 15:23
@pankajkoti pankajkoti requested review from pankajastro and tatiana June 3, 2026 15:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new seed_rendering_behavior option to RenderConfig to control how dbt seed nodes are represented in the generated Airflow graph, including a new “run only when seed CSV changes” mode that persists checksums via Airflow Variables.

Changes:

  • Introduces SeedRenderingBehavior (ALWAYS, WHEN_SEED_CHANGES, RENDER_ONLY, NONE) and wires it into task-metadata generation for seed nodes.
  • Implements seed change detection utilities (cosmos/dbt/seed.py) and runtime gating in DbtSeedLocalOperator.execute().
  • Extends config/runtime validation and adds unit tests covering allowed combinations and operator behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cosmos/airflow/graph.py Applies seed_rendering_behavior during task metadata creation (including placeholder/no-render paths).
cosmos/config.py Adds RenderConfig.seed_rendering_behavior and validates incompatibility with TestBehavior.BUILD.
cosmos/constants.py Adds EMPTY_OPERATOR_CLASS and the SeedRenderingBehavior enum.
cosmos/converter.py Validates WHEN_SEED_CHANGES is only used with worker-filesystem execution modes.
cosmos/dbt/graph.py Adds checksum to DbtNode and populates it from manifest/dbt-ls parsing.
cosmos/dbt/seed.py New module for seed checksum resolution, Variable keying, and persistence helpers.
cosmos/operators/local.py Adds runtime skip logic to DbtSeedLocalOperator for unchanged seeds.
cosmos/__init__.py Exposes SeedRenderingBehavior at the package top level.
tests/airflow/test_graph.py Tests seed rendering modes in task metadata (NONE / RENDER_ONLY / WHEN_SEED_CHANGES / ALWAYS).
tests/dbt/test_graph.py Updates expected node context to include checksum.
tests/dbt/test_seed.py New unit tests for seed change detection helpers.
tests/operators/test_local.py Tests local seed operator skipping/running/persisting behavior under change detection.
tests/test_config.py Tests config validation and defaulting for seed_rendering_behavior.
tests/test_converter.py Tests execution-mode validation for WHEN_SEED_CHANGES.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/airflow/graph.py Outdated
Comment thread cosmos/dbt/seed.py Outdated
@pankajkoti
Copy link
Copy Markdown
Contributor Author

The PR is ready for review. Please do not merge this PR. It relies on #2758 to be merged first.

@pankajkoti pankajkoti changed the title Add configurable seed rendering behavior [Ready for review, DO NOT MERGE] Add configurable seed rendering behavior Jun 3, 2026
@tuantran0910
Copy link
Copy Markdown
Contributor

Hi @pankajkoti, thank you very much for helping me continue with the PR adding seed rendering behavior. I am so sorry that I don't have time to continue putting effort in my original PR because I was so busy with my work. I am happy to see this PR and looking forward to seeing it merged in the future.

…ring

Drop the duplicate EMPTY_OPERATOR_CLASS constant this branch added to
constants.py and reference EMPTY_OPERATOR_CLASS_PATH from
cosmos.airflow.compatibility instead, so the RENDER_ONLY seed placeholder
shares the version-aware path now on main.
Copilot AI review requested due to automatic review settings June 4, 2026 12:43
@pankajkoti pankajkoti changed the title [Ready for review, DO NOT MERGE] Add configurable seed rendering behavior Add configurable seed rendering behavior Jun 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Comment thread cosmos/operators/local.py
Comment thread cosmos/airflow/graph.py
Comment thread cosmos/dbt/seed.py Outdated
Comment thread cosmos/dbt/seed.py Outdated
@pankajkoti
Copy link
Copy Markdown
Contributor Author

When seed_rendering_behavior=NONE, no seeds rendered
Screenshot 2026-06-02 at 4 35 24 PM

When seed_rendering_behavior=RENDER_ONLY, rendered as empty operators
Screenshot 2026-06-02 at 4 36 46 PM

When seed_rendering_behavior=WHEN_SEED_CHANGES, and seed has not changed by checking the checksum in Variables, does not re-run seed
Screenshot 2026-06-02 at 4 42 37 PM
Screenshot 2026-06-02 at 4 45 30 PM

When seed_rendering_behavior=WHEN_SEED_CHANGES, and TEST_BEHAVIOR=BUILD, incompatible
Screenshot 2026-06-02 at 4 48 43 PM

TEST_BEHAVIOR=BUILD continues to work with seed_rendering_behavior=ALWAYS (the default)
Screenshot 2026-06-02 at 4 49 22 PM

@pankajkoti pankajkoti added the priority:high High priority issues are blocking or critical issues without a workaround and large impact label Jun 4, 2026
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankajkoti, thanks a lot for this feature; it will benefit many users.

I left some inline feedback.

Regarding this PR comment:

Under ExecutionMode.WATCHER, a single dbt build runs all seeds regardless of this setting; only ALWAYS is meaningful there.

We could adopt a similar strategy to the one we used to add support for SourceRenderingBehavior in ExecutionMode.WATCHER. The work was implemented/improved throughout a few PRs:

WDYT of logging a follow-up ticket to add support for SeedRenderingBehavior in ExecutionMode.WATCHER?

Comment thread cosmos/airflow/graph.py Outdated
Comment thread cosmos/dbt/seed.py Outdated
Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/seed.py Outdated
Comment thread cosmos/dbt/graph.py
Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/seed.py Outdated
Comment thread cosmos/operators/local.py Outdated
Compute the seed checksum as a DbtNode property that always streams the
SHA256 of the seed CSV, so WHEN_SEED_CHANGES behaves the same under
LoadMode.MANIFEST and LoadMode.DBT_LS. Remove cosmos/dbt/seed.py and move
the Variable persistence into cache.py. Read the stored checksum
best-effort, falling back to running the seed on backend errors. Strip
whitespace before to_boolean in _is_full_refresh and rename the render
flag to should_run_if_seed_changed.
…eature-improve-how-cosmos-renders-runs-seeds

# Conflicts:
#	cosmos/airflow/graph.py
#	cosmos/dbt/graph.py
#	tests/test_converter.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread cosmos/operators/local.py
Comment thread cosmos/converter.py
@pankajkoti pankajkoti requested a review from tatiana June 4, 2026 17:08
@pankajkoti
Copy link
Copy Markdown
Contributor Author

Thanks @tatiana for the thorough review. I have pushed changes addressing the inline feedback (replies are in each thread):

  • checksum is now a property on DbtNode, always computed from the seed CSV (streamed in chunks), so it is consistent across LoadMode.MANIFEST and LoadMode.DBT_LS.
  • Removed cosmos/dbt/seed.py; the Airflow Variable persistence now lives in cache.py, and reads are best effort (they log and fall back to running the seed on backend errors).
  • Adopted the cleaner should_run interface in the operator and renamed the render flag to should_run_if_seed_changed.
  • Also handled the Copilot notes: streaming checksum, and stripping whitespace before to_boolean in _is_full_refresh.

On the watcher: agreed on logging a follow-up. I opened #2772 to add SeedRenderingBehavior support under ExecutionMode.WATCHER, following the SourceRenderingBehavior approach you linked.

When you have a moment, could you please take another look? I have re-requested your review.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankajkoti thanks a lot for addressing the feedback! Minor feedback inline - non-blocking to merge this PR (you could implement the improvements in a follow-up PR).

Please, rebase before merging

Comment thread cosmos/dbt/graph.py
Comment thread cosmos/dbt/graph.py
Comment thread cosmos/operators/local.py
Comment thread cosmos/cache.py
Comment thread cosmos/cache.py
Copilot AI review requested due to automatic review settings June 5, 2026 14:15
@pankajkoti pankajkoti review requested due to automatic review settings June 5, 2026 14:15
@pankajkoti
Copy link
Copy Markdown
Contributor Author

@tatiana Thanks for the review. I have logged a follow-up ticket and scheduled it for the upcoming cycle to address the remaining feedback here.

@pankajkoti pankajkoti merged commit d4c6785 into main Jun 5, 2026
126 checks passed
@pankajkoti pankajkoti deleted the pankajkoti/boss-240-feature-improve-how-cosmos-renders-runs-seeds branch June 5, 2026 14:37
pankajkoti added a commit that referenced this pull request Jun 5, 2026
## Description

Adds an example DAG demonstrating
`SeedRenderingBehavior.WHEN_SEED_CHANGES` against the `jaffle_shop`
project. It is exercised end to end by the example-DAG integration tests
(`dag.test()`).

Stacked on #2755.

## Related Issue(s)

Part of #1576

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@tatiana tatiana added this to the Cosmos 1.15.0 milestone Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:high High priority issues are blocking or critical issues without a workaround and large impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Improve how Cosmos renders & runs seeds

4 participants