Add configurable seed rendering behavior#2755
Conversation
Add a `seed_rendering_behavior` option to `RenderConfig` that controls how Cosmos renders and runs dbt seeds, mirroring `source_rendering_behavior`: - ALWAYS (default): render the seed and run `dbt seed` on every execution. - WHEN_SEED_CHANGES: render the seed, but only run `dbt seed` when the seed's CSV content has changed since the last successful run. The checksum is read from dbt's manifest (falling back to hashing the CSV) and the last-seen value is persisted as an Airflow Variable scoped per DbtDag/TaskGroup and seed. Supported for ExecutionMode.LOCAL, VIRTUALENV and AIRFLOW_ASYNC, and incompatible with TestBehavior.BUILD; both raise CosmosValueError. - RENDER_ONLY: render the seed as a no-op EmptyOperator placeholder. - NONE: do not render the seed at all. The gate runs before the TestBehavior.BUILD branch so NONE and RENDER_ONLY are honored under every test behavior. On an unchanged run the seed task succeeds without running `dbt seed`, without emitting its dataset, and without skip-propagating to downstream tasks.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2755 +/- ##
==========================================
+ Coverage 98.36% 98.37% +0.01%
==========================================
Files 107 107
Lines 7942 8021 +79
==========================================
+ Hits 7812 7891 +79
Misses 130 130 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds a new seed_rendering_behavior option to RenderConfig to control how dbt seed nodes are represented in the generated Airflow graph, including a new “run only when seed CSV changes” mode that persists checksums via Airflow Variables.
Changes:
- Introduces
SeedRenderingBehavior(ALWAYS,WHEN_SEED_CHANGES,RENDER_ONLY,NONE) and wires it into task-metadata generation for seed nodes. - Implements seed change detection utilities (
cosmos/dbt/seed.py) and runtime gating inDbtSeedLocalOperator.execute(). - Extends config/runtime validation and adds unit tests covering allowed combinations and operator behavior.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
cosmos/airflow/graph.py |
Applies seed_rendering_behavior during task metadata creation (including placeholder/no-render paths). |
cosmos/config.py |
Adds RenderConfig.seed_rendering_behavior and validates incompatibility with TestBehavior.BUILD. |
cosmos/constants.py |
Adds EMPTY_OPERATOR_CLASS and the SeedRenderingBehavior enum. |
cosmos/converter.py |
Validates WHEN_SEED_CHANGES is only used with worker-filesystem execution modes. |
cosmos/dbt/graph.py |
Adds checksum to DbtNode and populates it from manifest/dbt-ls parsing. |
cosmos/dbt/seed.py |
New module for seed checksum resolution, Variable keying, and persistence helpers. |
cosmos/operators/local.py |
Adds runtime skip logic to DbtSeedLocalOperator for unchanged seeds. |
cosmos/__init__.py |
Exposes SeedRenderingBehavior at the package top level. |
tests/airflow/test_graph.py |
Tests seed rendering modes in task metadata (NONE / RENDER_ONLY / WHEN_SEED_CHANGES / ALWAYS). |
tests/dbt/test_graph.py |
Updates expected node context to include checksum. |
tests/dbt/test_seed.py |
New unit tests for seed change detection helpers. |
tests/operators/test_local.py |
Tests local seed operator skipping/running/persisting behavior under change detection. |
tests/test_config.py |
Tests config validation and defaulting for seed_rendering_behavior. |
tests/test_converter.py |
Tests execution-mode validation for WHEN_SEED_CHANGES. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The PR is ready for review. Please do not merge this PR. It relies on #2758 to be merged first. |
|
Hi @pankajkoti, thank you very much for helping me continue with the PR adding seed rendering behavior. I am so sorry that I don't have time to continue putting effort in my original PR because I was so busy with my work. I am happy to see this PR and looking forward to seeing it merged in the future. |
…os-renders-runs-seeds
…ring Drop the duplicate EMPTY_OPERATOR_CLASS constant this branch added to constants.py and reference EMPTY_OPERATOR_CLASS_PATH from cosmos.airflow.compatibility instead, so the RENDER_ONLY seed placeholder shares the version-aware path now on main.
tatiana
left a comment
There was a problem hiding this comment.
@pankajkoti, thanks a lot for this feature; it will benefit many users.
I left some inline feedback.
Regarding this PR comment:
Under
ExecutionMode.WATCHER, a single dbt build runs all seeds regardless of this setting; only ALWAYS is meaningful there.
We could adopt a similar strategy to the one we used to add support for SourceRenderingBehavior in ExecutionMode.WATCHER. The work was implemented/improved throughout a few PRs:
WDYT of logging a follow-up ticket to add support for SeedRenderingBehavior in ExecutionMode.WATCHER?
Compute the seed checksum as a DbtNode property that always streams the SHA256 of the seed CSV, so WHEN_SEED_CHANGES behaves the same under LoadMode.MANIFEST and LoadMode.DBT_LS. Remove cosmos/dbt/seed.py and move the Variable persistence into cache.py. Read the stored checksum best-effort, falling back to running the seed on backend errors. Strip whitespace before to_boolean in _is_full_refresh and rename the render flag to should_run_if_seed_changed.
…eature-improve-how-cosmos-renders-runs-seeds # Conflicts: # cosmos/airflow/graph.py # cosmos/dbt/graph.py # tests/test_converter.py
|
Thanks @tatiana for the thorough review. I have pushed changes addressing the inline feedback (replies are in each thread):
On the watcher: agreed on logging a follow-up. I opened #2772 to add When you have a moment, could you please take another look? I have re-requested your review. |
There was a problem hiding this comment.
@pankajkoti thanks a lot for addressing the feedback! Minor feedback inline - non-blocking to merge this PR (you could implement the improvements in a follow-up PR).
Please, rebase before merging
…os-renders-runs-seeds
## Description Adds an example DAG demonstrating `SeedRenderingBehavior.WHEN_SEED_CHANGES` against the `jaffle_shop` project. It is exercised end to end by the example-DAG integration tests (`dag.test()`). Stacked on #2755. ## Related Issue(s) Part of #1576 🤖 Generated with [Claude Code](https://claude.com/claude-code)






Description
Adds a
seed_rendering_behavioroption toRenderConfig, giving control over how Cosmos renders and runs dbt seeds (analogous tosource_rendering_behavior):ALWAYS(default): render the seed and rundbt seedon every execution (the original Cosmos behaviour).WHEN_SEED_CHANGES: render the seed, but only rundbt seedwhen its CSV content has changed since the last successful run. The checksum is read from dbt's manifest (falling back to hashing the CSV) and the last-seen value is persisted as an Airflow Variable scoped perDbtDag/DbtTaskGroupand seed. Supported forExecutionMode.LOCAL,VIRTUALENVandAIRFLOW_ASYNC, and incompatible withTestBehavior.BUILD; both raiseCosmosValueError.RENDER_ONLY: render the seed as a no-opEmptyOperatorplaceholder.NONE: do not render the seed at all.The gate runs before the
TestBehavior.BUILDbranch soNONE/RENDER_ONLYare honoured under every test behaviour. On an unchanged run the seed task succeeds without runningdbt seed, without emitting its dataset, and without skip-propagating downstream. Change detection delegates toAbstractDbtBase.execute(), preservingextra_context, debug-mode tracking and**kwargs.Under
ExecutionMode.WATCHERa singledbt buildruns all seeds regardless of this setting; onlyALWAYSis meaningful there.Fresh implementation building on the approach explored in #2246 by @tuantran0910.
Related Issue(s)
Closes #1576
🤖 Generated with Claude Code