Skip to content

Add seed rendering behavior support in Cosmos#2246

Open
tuantran0910 wants to merge 14 commits into
astronomer:mainfrom
tuantran0910:feat/seed-rendering-behaviour
Open

Add seed rendering behavior support in Cosmos#2246
tuantran0910 wants to merge 14 commits into
astronomer:mainfrom
tuantran0910:feat/seed-rendering-behaviour

Conversation

@tuantran0910

@tuantran0910 tuantran0910 commented Jan 5, 2026

Copy link
Copy Markdown
Contributor

Description

This PR introduces a new SeedRenderingBehavior configuration option that allows users to control how dbt seed nodes are rendered in Airflow DAGs. Previously, Cosmos would always render seeds and attempt to run dbt seed. This new feature provides flexibility similar to how SourceRenderingBehavior works for source nodes.

Motivation

In most production scenarios, seeds do not need to be continuously run on every DAG execution. This feature allows users to:

  • Skip seed rendering entirely when seeds are managed outside of Cosmos
  • Only run seeds when the underlying CSV file has changed, optimizing DAG execution time
  • Maintain backward compatibility for users who want the original behavior

New Configuration Options

The seed_rendering_behavior parameter in RenderConfig accepts:

Value Description
ALWAYS Always render and run seeds (default behavior)
NONE Don't render any seeds in DAG/TaskGroup
WHEN_SEED_CHANGES Only execute seeds if the CSV file has changed since last successful run

How WHEN_SEED_CHANGES Works

  1. At task execution time, computes a SHA256 hash of the seed CSV file
  2. Compares against the stored hash (persisted as an Airflow Variable)
  3. Skips execution if hashes match; runs the seed if they differ
  4. Stores the new hash after successful execution

Test Behavior

Test behavior for seeds is now controlled exclusively via TestBehavior, not through SeedRenderingBehavior. This separation of concerns simplifies configuration:

# Run seeds with tests
RenderConfig(
    seed_rendering_behavior=SeedRenderingBehavior.ALWAYS,
    test_behavior=TestBehavior.AFTER_EACH,
)

# Run seeds without tests
RenderConfig(
    seed_rendering_behavior=SeedRenderingBehavior.ALWAYS,
    test_behavior=TestBehavior.NONE,
)

Example Usage

import datetime
from pathlib import Path

from airflow.providers.standard.operators.empty import EmptyOperator
from airflow.sdk import DAG, Asset, Param
from cosmos import DbtTaskGroup, ProjectConfig, ProfileConfig, ExecutionConfig, RenderConfig
from cosmos.constants import TestBehavior, ExecutionMode, SeedRenderingBehavior


# Paths configuration
DAGS_DIR = Path("/opt/airflow/dags")
DBT_PROJECT_PATH = DAGS_DIR / "dbt" / "jaffle_shop"
DBT_PROFILES_PATH = DBT_PROJECT_PATH / "profiles.yml"
DBT_MANIFEST_PATH = DBT_PROJECT_PATH / "target" / "manifest.json"
DBT_CATALOG_PATH = DBT_PROJECT_PATH / "target" / "catalog.json"
DBT_RUN_RESULTS_PATH = DBT_PROJECT_PATH / "target" / "run_results.json"

# Default arguments for the DAG
default_args = {'owner': 'data-team', 'depends_on_past': False, 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': datetime.timedelta(seconds=300), 'priority_weight': 1, 'weight_rule': 'downstream', 'queue': 'default', 'pool': 'default_pool', 'email': ['data-team@example.com']}

# Profile configuration using profiles.yml file
profile_config = ProfileConfig(
    profile_name="jaffle_shop",
    target_name="production",
    profiles_yml_filepath=DBT_PROFILES_PATH,
)

# Execution configuration
execution_config = ExecutionConfig(
    dbt_executable_path="dbt",
    execution_mode=ExecutionMode.LOCAL
)

# Render configuration - select which models to run
render_config = RenderConfig(
    select=['path:models/staging', 'path:models/intermediate', 'path:models/marts', 'path:seeds/'],
    emit_datasets=True,
    test_behavior=TestBehavior.AFTER_EACH,
    seed_rendering_behavior=SeedRenderingBehavior.WHEN_SEED_CHANGES,
)

# Operator arguments - full_refresh is templated to use DAG params at runtime
operator_args = {
    # Use Jinja templating to read full_refresh from DAG params at runtime
    "full_refresh": "{{ params.full_refresh }}",
}

schedule="@daily"

# Create the DAG with full_refresh as a runtime parameter
dag = DAG(
    dag_id="dbt_jaffle_shop_core",
    description="Run all dbt transformations for jaffle_shop (4-level lineage)",
    default_args=default_args,
    schedule=schedule,
    start_date=datetime.datetime(2024, 1, 1),
    catchup=False,
    max_active_runs=1,
    tags=['dbt', 'jaffle_shop', 'full-pipeline', 'data-lineage'],
    params={
        "full_refresh": Param(
            default=False,
            type="boolean",
            description="Run dbt with --full-refresh flag to rebuild all incremental models",
        ),
    },
)

outlet_asset = Asset("dbt://dbt_jaffle_shop_core")

with dag:
    dbt_tasks = DbtTaskGroup(
        group_id="dbt_jaffle_shop",
        project_config=ProjectConfig(
            dbt_project_path=DBT_PROJECT_PATH,
        ),
        profile_config=profile_config,
        execution_config=execution_config,
        render_config=render_config,
        operator_args=operator_args,
    )

    emit_assets = EmptyOperator(
        task_id="emit_assets",
        outlets=[outlet_asset],
        trigger_rule="all_done",
    )

    dbt_tasks >> emit_assets
image image image

Related Issue(s)

Closes #1576

Additional

  • I don't really sure this is a good implementation, therefore, I am going to hearing for all suggestions if possible :D

Copilot AI review requested due to automatic review settings January 5, 2026 16:56
@netlify

netlify Bot commented Jan 5, 2026

Copy link
Copy Markdown

Deploy Preview for astronomer-cosmos ready!

Name Link
🔨 Latest commit 0d7fe53
🔍 Latest deploy log https://app.netlify.com/projects/astronomer-cosmos/deploys/698b5f03d57ad40008946493
😎 Deploy Preview https://deploy-preview-2246--astronomer-cosmos.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new SeedRenderingBehavior configuration option that provides flexible control over how dbt seed nodes are rendered in Airflow DAGs. The feature addresses the common production scenario where seeds don't need to run on every DAG execution by offering three rendering modes: ALWAYS (default), NONE, and WHEN_SEED_CHANGES.

Key changes:

  • Added SeedRenderingBehavior enum with three options for controlling seed execution
  • Implemented seed change detection using SHA256 hashing with Airflow Variables for the WHEN_SEED_CHANGES mode
  • Extended test behavior separation so seeds and tests are configured independently via TestBehavior and SeedRenderingBehavior

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cosmos/constants.py Added new SeedRenderingBehavior enum with ALWAYS, NONE, and WHEN_SEED_CHANGES options
cosmos/config.py Added seed_rendering_behavior parameter to RenderConfig with ALWAYS as default
cosmos/dbt/seed.py New module implementing seed change detection via file hashing and Airflow Variable storage
cosmos/dbt/graph.py Updated Variable import to support both Airflow 2 and 3
cosmos/airflow/graph.py Modified create_task_metadata to handle seed rendering behavior, including validation warning for incompatible TestBehavior.BUILD
cosmos/operators/local.py Overrode execute method in DbtSeedLocalOperator to implement WHEN_SEED_CHANGES logic
cosmos/init.py Exported SeedRenderingBehavior for public API
docs/configuration/seed-nodes-rendering.rst Added comprehensive documentation for the seed rendering feature
docs/configuration/render-config.rst Documented the new seed_rendering_behavior parameter
docs/configuration/index.rst Added seed nodes rendering to configuration index
dev/dags/example_seed_rendering.py Created example DAG demonstrating all seed rendering behaviors
tests/dbt/test_seed.py Added comprehensive unit tests for seed change detection module
tests/dbt/test_graph.py Updated Variable import for Airflow 2/3 compatibility
tests/airflow/test_graph.py Added tests for seed rendering behavior and updated existing tests with the new parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/configuration/seed-nodes-rendering.rst Outdated
Comment thread cosmos/airflow/graph.py
Comment thread cosmos/operators/local.py
Comment thread cosmos/dbt/seed.py Outdated
Comment thread tests/airflow/test_graph.py Outdated
Comment thread cosmos/operators/local.py
@tatiana

tatiana commented Jan 29, 2026

Copy link
Copy Markdown
Collaborator

HI @tuantran0910 thanks for the work here!
We noticed some checks are failing, please, let us know once those are fixed, so we can review this PR

@tatiana tatiana added this to the Cosmos 1.14.0 milestone Jan 29, 2026
@tuantran0910

Copy link
Copy Markdown
Contributor Author

HI @tuantran0910 thanks for the work here! We noticed some checks are failing, please, let us know once those are fixed, so we can review this PR

Hi @tatiana, I am going to fix them this weekend. Thanks for letting me know :D

@tuantran0910

Copy link
Copy Markdown
Contributor Author

Hi @tatiana, I have just pushed a new commit, can you enable to run the tests again? Thanks :D

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/operators/local.py
Comment thread cosmos/airflow/graph.py Outdated
Comment thread cosmos/airflow/graph.py Outdated
Comment thread docs/configuration/seed-nodes-rendering.rst Outdated
Copilot AI review requested due to automatic review settings March 6, 2026 03:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/operators/local.py
Comment thread cosmos/airflow/graph.py
@tuantran0910

Copy link
Copy Markdown
Contributor Author

Hi @tatiana, can you enable to run the tests again? Thanks :D

@codecov

codecov Bot commented Mar 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.99%. Comparing base (331b90a) to head (fcb2afc).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2246      +/-   ##
==========================================
+ Coverage   97.97%   97.99%   +0.02%     
==========================================
  Files         103      104       +1     
  Lines        7455     7545      +90     
==========================================
+ Hits         7304     7394      +90     
  Misses        151      151              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tatiana

tatiana commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator

Hi @tuantran0910, Thanks a lot for working on this and getting all the tests passing! Please, could you:

  • rebase
  • add relevant tests so the coverage checks pass

We're planning to release Cosmos 1.14.0 next week, and I really would love to have this as part of it.

tuantran0910 and others added 9 commits March 30, 2026 00:45
…ettings for "Always Render Seeds" and "Never Render Seeds".
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add test for hash computation failure in has_seed_changed()
- Add test for seed unchanged skip behavior in DbtSeedLocalOperator
- Remove unreachable warning for WHEN_SEED_CHANGES with BUILD
- Fix DbtNode constructor arguments in test_graph.py tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 29, 2026 17:45
@tuantran0910 tuantran0910 force-pushed the feat/seed-rendering-behaviour branch from a98d756 to 7d1c790 Compare March 29, 2026 17:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2419 to +2423
@patch("cosmos.dbt.seed.has_seed_changed")
def test_dbt_seed_local_operator_execute_skips_when_seed_unchanged(
mock_has_seed_changed, mock_update_hash, caplog, tmp_path
):
"""Test that DbtSeedLocalOperator.execute() skips the seed command when seed has not changed."""

Copilot AI Mar 29, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the "seed unchanged" skip-path is covered for DbtSeedLocalOperator.execute. Add a complementary test for the changed-path (has_seed_changed=True) to assert build_and_run_cmd is called and update_seed_hash_after_run runs after a successful execution.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 7, 2026 07:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/operators/local.py
Comment on lines +1011 to +1018
def execute(self, context: Context, **kwargs: Any) -> None:
from cosmos.constants import SeedRenderingBehavior
from cosmos.dbt.seed import has_seed_changed, update_seed_hash_after_run

# Check if we should detect seed changes
seed_rendering_behavior_value = self.extra_context.get("seed_rendering_behavior")
uses_seed_change_detection = seed_rendering_behavior_value == SeedRenderingBehavior.WHEN_SEED_CHANGES.value

Copilot AI Apr 7, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DbtSeedLocalOperator previously inherited AbstractDbtBase.execute(), which merges extra_context into the Airflow context and honors settings.enable_debug_mode (memory tracking). Overriding execute() here bypasses that behavior, which can break templates/interceptors that rely on extra_context being present in context and makes debug-mode behavior inconsistent with other operators. Consider reusing the base execute logic (e.g., do the seed-change pre-check, then delegate to AbstractDbtBase.execute() for the actual run) or explicitly call context_merge + debug-mode handling in this override.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Can we consider this please?

Comment thread cosmos/operators/local.py
Comment on lines +1034 to +1038
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
update_seed_hash_after_run(dag_task_group_identifier, node_unique_id, seed_file_path)
return

self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())

Copilot AI Apr 7, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This execute() method accepts **kwargs but does not forward them to build_and_run_cmd(). Other Local operators forward **kwargs (e.g., DbtBuildLocalOperator) and callers may rely on flags like push_run_results_to_xcom. Please pass **kwargs through on both build_and_run_cmd() calls to avoid regressions.

Suggested change
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
update_seed_hash_after_run(dag_task_group_identifier, node_unique_id, seed_file_path)
return
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags(), **kwargs)
update_seed_hash_after_run(dag_task_group_identifier, node_unique_id, seed_file_path)
return
self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags(), **kwargs)

Copilot uses AI. Check for mistakes.
@tatiana tatiana modified the milestones: Cosmos 1.14.0, Cosmos 1.15.0 Apr 7, 2026
@tatiana

tatiana commented Apr 7, 2026

Copy link
Copy Markdown
Collaborator

Hi @tuantran0910,

The past few weeks have been busier than usual, and we haven’t been able to review this PR with the attention it deserves - apologies for that. Thanks for fixing all the checks.

When you get a chance, could you please:

  • Rebase the branch
  • Address the Copilot feedback

We’re planning to release Cosmos 1.15 in about a month, and I’d really like to include this improvement. We’ll make sure to review it promptly and provide any necessary feedback so it can be part of the release.

@tuantran0910

Copy link
Copy Markdown
Contributor Author

Hi @tuantran0910,

The past few weeks have been busier than usual, and we haven’t been able to review this PR with the attention it deserves - apologies for that. Thanks for fixing all the checks.

When you get a chance, could you please:

  • Rebase the branch
  • Address the Copilot feedback

We’re planning to release Cosmos 1.15 in about a month, and I’d really like to include this improvement. We’ll make sure to review it promptly and provide any necessary feedback so it can be part of the release.

I will be back to this PR this weekend, thank @tatiana a lot :D

@pankajkoti pankajkoti left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @tuantran0910. I have some questions inline.

Also, I guess with this, we're populating the Variables per seed per DAG ? Can we also give a thought to how we can clean it up?

Currently, I haven't thought enough on the feasibility, but as an alternative to this approach, do you think we could leverage dbt seed --select state:modified+ somehow? That would help us avoid computing and storing hashes and populating the Variables.

Comment thread cosmos/operators/local.py
Comment on lines +1011 to +1018
def execute(self, context: Context, **kwargs: Any) -> None:
from cosmos.constants import SeedRenderingBehavior
from cosmos.dbt.seed import has_seed_changed, update_seed_hash_after_run

# Check if we should detect seed changes
seed_rendering_behavior_value = self.extra_context.get("seed_rendering_behavior")
uses_seed_change_detection = seed_rendering_behavior_value == SeedRenderingBehavior.WHEN_SEED_CHANGES.value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Can we consider this please?

.. warning::
**Limitations of** ``when_seed_changes``:

- Only supported with ``ExecutionMode.LOCAL``. Other execution modes (Docker, Kubernetes, etc.) cannot access the seed CSV files from the Airflow worker.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through the code, we canraise CosmosValueError when seed_rendering_behavior == WHEN_SEED_CHANGES and execution_mode != LOCAL. WDYT?

Comment thread cosmos/dbt/seed.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we narrow down the exceptions caught in various methods in this module so that we do not use bare Exception class, please?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we mark all the methods in the module non-public by prefixing with underscores?

SeedRenderingBehavior.WHEN_SEED_CHANGES,
],
)
def test_create_task_metadata_seed_with_build_test_behavior(seed_rendering_behavior):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behaviour when TestBehaviour.BUILD + SeedRenderingBehavior.WHEN_SEED_CHANGES?. Since the base class for TestBehaviour.BUILD is DbtBuildLocalOperator, and I guess I don't see any override in it, it will be a no-op for SeedRenderingBehavior.WHEN_SEED_CHANGES, meaning this seed rendering behaviour will not come into effect?

@pankajkoti

Copy link
Copy Markdown
Contributor

Hi @tuantran0910, following up to check if you have thoughts on my review comments and if you'd like to discuss something there?

pankajkoti added a commit that referenced this pull request Jun 5, 2026
## Description

Adds a `seed_rendering_behavior` option to `RenderConfig`, giving
control over how Cosmos renders and runs dbt seeds (analogous to
`source_rendering_behavior`):

- `ALWAYS` (default): render the seed and run `dbt seed` on every
execution (the original Cosmos behaviour).
- `WHEN_SEED_CHANGES`: render the seed, but only run `dbt seed` when its
CSV content has changed since the last successful run. The checksum is
read from dbt's manifest (falling back to hashing the CSV) and the
last-seen value is persisted as an Airflow Variable scoped per
`DbtDag`/`DbtTaskGroup` and seed. Supported for `ExecutionMode.LOCAL`,
`VIRTUALENV` and `AIRFLOW_ASYNC`, and incompatible with
`TestBehavior.BUILD`; both raise `CosmosValueError`.
- `RENDER_ONLY`: render the seed as a no-op `EmptyOperator` placeholder.
- `NONE`: do not render the seed at all.

The gate runs before the `TestBehavior.BUILD` branch so
`NONE`/`RENDER_ONLY` are honoured under every test behaviour. On an
unchanged run the seed task succeeds without running `dbt seed`, without
emitting its dataset, and without skip-propagating downstream. Change
detection delegates to `AbstractDbtBase.execute()`, preserving
`extra_context`, debug-mode tracking and `**kwargs`.

Under `ExecutionMode.WATCHER` a single `dbt build` runs all seeds
regardless of this setting; only `ALWAYS` is meaningful there.

Fresh implementation building on the approach explored in #2246 by
@tuantran0910.

## Related Issue(s)

Closes #1576

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Improve how Cosmos renders & runs seeds

4 participants