Add seed rendering behavior support in Cosmos#2246
Conversation
✅ Deploy Preview for astronomer-cosmos ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new SeedRenderingBehavior configuration option that provides flexible control over how dbt seed nodes are rendered in Airflow DAGs. The feature addresses the common production scenario where seeds don't need to run on every DAG execution by offering three rendering modes: ALWAYS (default), NONE, and WHEN_SEED_CHANGES.
Key changes:
- Added
SeedRenderingBehaviorenum with three options for controlling seed execution - Implemented seed change detection using SHA256 hashing with Airflow Variables for the WHEN_SEED_CHANGES mode
- Extended test behavior separation so seeds and tests are configured independently via
TestBehaviorandSeedRenderingBehavior
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cosmos/constants.py | Added new SeedRenderingBehavior enum with ALWAYS, NONE, and WHEN_SEED_CHANGES options |
| cosmos/config.py | Added seed_rendering_behavior parameter to RenderConfig with ALWAYS as default |
| cosmos/dbt/seed.py | New module implementing seed change detection via file hashing and Airflow Variable storage |
| cosmos/dbt/graph.py | Updated Variable import to support both Airflow 2 and 3 |
| cosmos/airflow/graph.py | Modified create_task_metadata to handle seed rendering behavior, including validation warning for incompatible TestBehavior.BUILD |
| cosmos/operators/local.py | Overrode execute method in DbtSeedLocalOperator to implement WHEN_SEED_CHANGES logic |
| cosmos/init.py | Exported SeedRenderingBehavior for public API |
| docs/configuration/seed-nodes-rendering.rst | Added comprehensive documentation for the seed rendering feature |
| docs/configuration/render-config.rst | Documented the new seed_rendering_behavior parameter |
| docs/configuration/index.rst | Added seed nodes rendering to configuration index |
| dev/dags/example_seed_rendering.py | Created example DAG demonstrating all seed rendering behaviors |
| tests/dbt/test_seed.py | Added comprehensive unit tests for seed change detection module |
| tests/dbt/test_graph.py | Updated Variable import for Airflow 2/3 compatibility |
| tests/airflow/test_graph.py | Added tests for seed rendering behavior and updated existing tests with the new parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
HI @tuantran0910 thanks for the work here! |
Hi @tatiana, I am going to fix them this weekend. Thanks for letting me know :D |
|
Hi @tatiana, I have just pushed a new commit, can you enable to run the tests again? Thanks :D |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @tatiana, can you enable to run the tests again? Thanks :D |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2246 +/- ##
==========================================
+ Coverage 97.97% 97.99% +0.02%
==========================================
Files 103 104 +1
Lines 7455 7545 +90
==========================================
+ Hits 7304 7394 +90
Misses 151 151 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @tuantran0910, Thanks a lot for working on this and getting all the tests passing! Please, could you:
We're planning to release Cosmos 1.14.0 next week, and I really would love to have this as part of it. |
…ettings for "Always Render Seeds" and "Never Render Seeds".
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add test for hash computation failure in has_seed_changed() - Add test for seed unchanged skip behavior in DbtSeedLocalOperator - Remove unreachable warning for WHEN_SEED_CHANGES with BUILD - Fix DbtNode constructor arguments in test_graph.py tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
a98d756 to
7d1c790
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @patch("cosmos.dbt.seed.has_seed_changed") | ||
| def test_dbt_seed_local_operator_execute_skips_when_seed_unchanged( | ||
| mock_has_seed_changed, mock_update_hash, caplog, tmp_path | ||
| ): | ||
| """Test that DbtSeedLocalOperator.execute() skips the seed command when seed has not changed.""" |
There was a problem hiding this comment.
Only the "seed unchanged" skip-path is covered for DbtSeedLocalOperator.execute. Add a complementary test for the changed-path (has_seed_changed=True) to assert build_and_run_cmd is called and update_seed_hash_after_run runs after a successful execution.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def execute(self, context: Context, **kwargs: Any) -> None: | ||
| from cosmos.constants import SeedRenderingBehavior | ||
| from cosmos.dbt.seed import has_seed_changed, update_seed_hash_after_run | ||
|
|
||
| # Check if we should detect seed changes | ||
| seed_rendering_behavior_value = self.extra_context.get("seed_rendering_behavior") | ||
| uses_seed_change_detection = seed_rendering_behavior_value == SeedRenderingBehavior.WHEN_SEED_CHANGES.value | ||
|
|
There was a problem hiding this comment.
DbtSeedLocalOperator previously inherited AbstractDbtBase.execute(), which merges extra_context into the Airflow context and honors settings.enable_debug_mode (memory tracking). Overriding execute() here bypasses that behavior, which can break templates/interceptors that rely on extra_context being present in context and makes debug-mode behavior inconsistent with other operators. Consider reusing the base execute logic (e.g., do the seed-change pre-check, then delegate to AbstractDbtBase.execute() for the actual run) or explicitly call context_merge + debug-mode handling in this override.
There was a problem hiding this comment.
+1, Can we consider this please?
| self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) | ||
| update_seed_hash_after_run(dag_task_group_identifier, node_unique_id, seed_file_path) | ||
| return | ||
|
|
||
| self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) |
There was a problem hiding this comment.
This execute() method accepts **kwargs but does not forward them to build_and_run_cmd(). Other Local operators forward **kwargs (e.g., DbtBuildLocalOperator) and callers may rely on flags like push_run_results_to_xcom. Please pass **kwargs through on both build_and_run_cmd() calls to avoid regressions.
| self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) | |
| update_seed_hash_after_run(dag_task_group_identifier, node_unique_id, seed_file_path) | |
| return | |
| self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags()) | |
| self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags(), **kwargs) | |
| update_seed_hash_after_run(dag_task_group_identifier, node_unique_id, seed_file_path) | |
| return | |
| self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags(), **kwargs) |
|
Hi @tuantran0910, The past few weeks have been busier than usual, and we haven’t been able to review this PR with the attention it deserves - apologies for that. Thanks for fixing all the checks. When you get a chance, could you please:
We’re planning to release Cosmos 1.15 in about a month, and I’d really like to include this improvement. We’ll make sure to review it promptly and provide any necessary feedback so it can be part of the release. |
I will be back to this PR this weekend, thank @tatiana a lot :D |
pankajkoti
left a comment
There was a problem hiding this comment.
Thanks for the contribution @tuantran0910. I have some questions inline.
Also, I guess with this, we're populating the Variables per seed per DAG ? Can we also give a thought to how we can clean it up?
Currently, I haven't thought enough on the feasibility, but as an alternative to this approach, do you think we could leverage dbt seed --select state:modified+ somehow? That would help us avoid computing and storing hashes and populating the Variables.
| def execute(self, context: Context, **kwargs: Any) -> None: | ||
| from cosmos.constants import SeedRenderingBehavior | ||
| from cosmos.dbt.seed import has_seed_changed, update_seed_hash_after_run | ||
|
|
||
| # Check if we should detect seed changes | ||
| seed_rendering_behavior_value = self.extra_context.get("seed_rendering_behavior") | ||
| uses_seed_change_detection = seed_rendering_behavior_value == SeedRenderingBehavior.WHEN_SEED_CHANGES.value | ||
|
|
There was a problem hiding this comment.
+1, Can we consider this please?
| .. warning:: | ||
| **Limitations of** ``when_seed_changes``: | ||
|
|
||
| - Only supported with ``ExecutionMode.LOCAL``. Other execution modes (Docker, Kubernetes, etc.) cannot access the seed CSV files from the Airflow worker. |
There was a problem hiding this comment.
Through the code, we canraise CosmosValueError when seed_rendering_behavior == WHEN_SEED_CHANGES and execution_mode != LOCAL. WDYT?
There was a problem hiding this comment.
Can we narrow down the exceptions caught in various methods in this module so that we do not use bare Exception class, please?
There was a problem hiding this comment.
Also, should we mark all the methods in the module non-public by prefixing with underscores?
| SeedRenderingBehavior.WHEN_SEED_CHANGES, | ||
| ], | ||
| ) | ||
| def test_create_task_metadata_seed_with_build_test_behavior(seed_rendering_behavior): |
There was a problem hiding this comment.
What is the expected behaviour when TestBehaviour.BUILD + SeedRenderingBehavior.WHEN_SEED_CHANGES?. Since the base class for TestBehaviour.BUILD is DbtBuildLocalOperator, and I guess I don't see any override in it, it will be a no-op for SeedRenderingBehavior.WHEN_SEED_CHANGES, meaning this seed rendering behaviour will not come into effect?
|
Hi @tuantran0910, following up to check if you have thoughts on my review comments and if you'd like to discuss something there? |
## Description Adds a `seed_rendering_behavior` option to `RenderConfig`, giving control over how Cosmos renders and runs dbt seeds (analogous to `source_rendering_behavior`): - `ALWAYS` (default): render the seed and run `dbt seed` on every execution (the original Cosmos behaviour). - `WHEN_SEED_CHANGES`: render the seed, but only run `dbt seed` when its CSV content has changed since the last successful run. The checksum is read from dbt's manifest (falling back to hashing the CSV) and the last-seen value is persisted as an Airflow Variable scoped per `DbtDag`/`DbtTaskGroup` and seed. Supported for `ExecutionMode.LOCAL`, `VIRTUALENV` and `AIRFLOW_ASYNC`, and incompatible with `TestBehavior.BUILD`; both raise `CosmosValueError`. - `RENDER_ONLY`: render the seed as a no-op `EmptyOperator` placeholder. - `NONE`: do not render the seed at all. The gate runs before the `TestBehavior.BUILD` branch so `NONE`/`RENDER_ONLY` are honoured under every test behaviour. On an unchanged run the seed task succeeds without running `dbt seed`, without emitting its dataset, and without skip-propagating downstream. Change detection delegates to `AbstractDbtBase.execute()`, preserving `extra_context`, debug-mode tracking and `**kwargs`. Under `ExecutionMode.WATCHER` a single `dbt build` runs all seeds regardless of this setting; only `ALWAYS` is meaningful there. Fresh implementation building on the approach explored in #2246 by @tuantran0910. ## Related Issue(s) Closes #1576 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Description
This PR introduces a new
SeedRenderingBehaviorconfiguration option that allows users to control how dbt seed nodes are rendered in Airflow DAGs. Previously, Cosmos would always render seeds and attempt to rundbt seed. This new feature provides flexibility similar to howSourceRenderingBehaviorworks for source nodes.Motivation
In most production scenarios, seeds do not need to be continuously run on every DAG execution. This feature allows users to:
New Configuration Options
The
seed_rendering_behaviorparameter inRenderConfigaccepts:ALWAYSNONEWHEN_SEED_CHANGESHow
WHEN_SEED_CHANGESWorksTest Behavior
Test behavior for seeds is now controlled exclusively via
TestBehavior, not throughSeedRenderingBehavior. This separation of concerns simplifies configuration:Example Usage
Related Issue(s)
Closes #1576
Additional