Skip to content

Support to prune sources without downstream references in dbt projects#1988

Merged
tatiana merged 33 commits into
astronomer:mainfrom
corsettigyg:source-prunning
Oct 2, 2025
Merged

Support to prune sources without downstream references in dbt projects#1988
tatiana merged 33 commits into
astronomer:mainfrom
corsettigyg:source-prunning

Conversation

@corsettigyg
Copy link
Copy Markdown
Collaborator

@corsettigyg corsettigyg commented Sep 19, 2025

Description

In cosmos, once the sources are rendered into our dag, an issue that might arise is that if we are selecting only a portion of our project or if we have dead-code in the sources.yml file, tasks for these unchecked sources will still be created, which will essentially pollute the UI.
Screenshot 2025-09-19 at 12 56 21

For large projects with hundreds of sources, users often select a fraction of the main project per DbtDag object, but all sources being rendered at once essentially means that the project is unreadable. The idea is to allow users to pass a source_prunning flag to RenderConfig, which will essentially remove source tasks without downstream dependencies

Screenshot 2025-09-19 at 13 00 58

Related Issue(s)

Breaking Change?

Should not be

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

@netlify
Copy link
Copy Markdown

netlify Bot commented Sep 19, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit fcd08e8
🔍 Latest deploy log https://app.netlify.com/projects/sunny-pastelito-5ecb04/deploys/68dd76d7f848980008d98d50

@corsettigyg corsettigyg changed the title Source prunning allow to prune sources without downstream references in dbt projects Sep 19, 2025
@corsettigyg corsettigyg marked this pull request as ready for review September 19, 2025 15:05
Copilot AI review requested due to automatic review settings September 19, 2025 15:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces source pruning functionality to Cosmos, allowing users to automatically remove source nodes from DAGs that have no downstream dependencies. This addresses the issue where unused or dead source nodes clutter DAGs, making large projects with hundreds of sources difficult to read when only selecting a subset of models.

Key changes:

  • Added source_prunning parameter to RenderConfig to enable/disable source pruning
  • Implemented DownstreamGraph class for efficient downstream dependency computation
  • Enhanced DbtNode with downstream relationship tracking capabilities

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
cosmos/config.py Added source_prunning parameter to RenderConfig
cosmos/dbt/graph.py Implemented DownstreamGraph class and enhanced DbtNode with downstream tracking
cosmos/airflow/graph.py Added source pruning logic to task creation pipeline
docs/configuration/source-nodes-rendering.rst Added documentation for the new source pruning feature
dev/dags/example_source_prunning.py Added example DAG demonstrating source pruning functionality
dev/dags/dbt/altered_jaffle_shop/models/staging/sources.yml Added test source node without connections

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread docs/configuration/source-nodes-rendering.rst Outdated
Comment thread docs/configuration/source-nodes-rendering.rst Outdated
Comment thread cosmos/config.py Outdated
Comment thread cosmos/config.py Outdated
Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/airflow/graph.py Outdated
Comment thread cosmos/airflow/graph.py Outdated
Comment thread cosmos/airflow/graph.py Outdated
Comment thread dev/dags/example_source_prunning.py Outdated
Comment thread dev/dags/example_source_prunning.py Outdated
@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Sep 22, 2025

HI @corsettigyg, thanks a lot for contributing this, it seems a valuable addition to Cosmos.

I know it is in draft mode - and I'll avoid adding any further comments right now - but I'd love it if we could get some early thoughts from @arojasb3 , who initially implemented the source-node support.

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.81%. Comparing base (ba9b869) to head (fcd08e8).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1988   +/-   ##
=======================================
  Coverage   97.81%   97.81%           
=======================================
  Files          87       87           
  Lines        5526     5536   +10     
=======================================
+ Hits         5405     5415   +10     
  Misses        121      121           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@corsettigyg corsettigyg marked this pull request as ready for review September 23, 2025 12:33
@corsettigyg
Copy link
Copy Markdown
Collaborator Author

corsettigyg commented Sep 23, 2025

@tatiana ready for review now 💯 tested it out in many different scenarios and it is working fine. my only small concern would be with performance since it is yet a loop for the sources, which can end up being kinda expensive.
Screenshot 2025-09-23 at 14 33 55

Comment thread tests/dbt/test_graph.py
Comment thread dev/dags/example_source_pruning.py
@tatiana tatiana changed the title allow to prune sources without downstream references in dbt projects Support to prune sources without downstream references in dbt projects Sep 25, 2025
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corsettigyg I had the hope we could get the feedback on this improvement based on other dbt source users - but I think we may not get that quick enough.

I have two feedback points:

  1. Since this is a breaking change, perhaps we should make this an opt-in feature (disabled by default) - and have a ticket to make it default in Cosmos 2.0, associating it to that milestone.

  2. Do you think it is worth to have this feature enabled/disabled per DAG? Would it be worth to, instead, having a "global" config that would affect all the Cosmos DAGs and TaskGroups, similar to these other configurations:
    https://astronomer.github.io/astronomer-cosmos/configuration/cosmos-conf.html?

Once we align on these two points, I'm happy to approve this PR, and we can have an alpha release exposing this feature, for early feedback as well - WDYT?

@pankajastro
Copy link
Copy Markdown
Contributor

Overall, this looks good to me, and I'm happy for it to be merged once @tatiana's feedback is addressed.
A couple of questions out of curiosity:

  • Is it common for dbt projects to leave dead code in sources.yml?
  • Does the current source rendering respect the --select flag?

@corsettigyg
Copy link
Copy Markdown
Collaborator Author

@tatiana

  1. it is already disabled by default 💯 python short-circuits the check if source_pruning is False (default)
  2. no strong opinions here. I see the advantage to have it as a global config particularly speaking so i can implement it as such 😄

@pankajastro

  1. Yes, especially when you have a big dbt project and you select per tags. You end up with a LOT of not used sources
  2. yes

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks excellent, @corsettigyg, thanks a lot for addressing the feedback so quickly.

Once our CI main branch issue is resolved by merging #2012, please proceed with merging your change - you should have the necessary super-powers!

@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Oct 2, 2025

@corsettigyg just saw you're on holidays until the 8th - I'll merge this PR, you'll have lots of opportunity to do the same in the future!

@tatiana tatiana merged commit 5f6e710 into astronomer:main Oct 2, 2025
96 checks passed
@tatiana tatiana mentioned this pull request Oct 7, 2025
@tatiana tatiana added this to the Cosmos 1.11.0 milestone Oct 28, 2025
tatiana added a commit that referenced this pull request Oct 29, 2025
**Features**

* Introduce ``ExecutionMode.WATCHER`` to reduce DAG run time by 1/5 in
several PRs. Learn more about it
[here](https://astronomer.github.io/astronomer-cosmos/getting_started/watcher-execution-mode.html#watcher-execution-mode).
This feature was implemented via multiple PRs, including:
* Expose new execution mode by @tatiana @pankajastro @pankajkoti in
#1999
* Add ``DbtProducerWatcherOperator`` for the proposed
``ExecutionMode.WATCHER`` by @pankajkoti in #1982
* Add ``DbtConsumerWatcherSensor`` for the proposed
``ExecutionMode.WATCHER`` by @pankajastro in #1998
* Push producer's task completion status to XCOM by @pankajkoti in #2000
* Add default priority_weight for ``DbtProducerWatcherOperator`` by
@pankajkoti in #1995
* Add sample dbt events for the dbt watcher execution mode by
@pankajkoti in #1952
* Add ``compiled_sql`` as a template fields on
```ExecutionMode.WATCHER``` when using ``run_results.json`` by
@pankajastro in #2070
* Set ``push_run_results_to_xcom`` kwargs correctly for invocation mode
subprocess and Watcher mode by @pankajastro in #2067
* Store compiled SQL as template field for dbt callback events in
``ExecutionMode.WATCHER`` by @pankajkoti in #2068
* Add initial documentation for ``ExecutionMode.WATCHER`` by @tatiana in
#2046
* Support running ``State.UPSTREAM_FAILED`` tasks when WATCHER consumer
upstream tasks fail by @tatiana in #2062
* Fail sensor tasks immediately if the ``ExecutionMode.WATCHER``
producer task fails by @pankajastro in #2040
  * Add ``WATCHER``` to GitHub issue template by @tatiana in #2056
* Add support for ``TestBehavior.AFTER_ALL`` with
``ExecutionMode.WATCHER`` by @pankajastro in #2049
* Add support for ``TestBehavior.NONE`` with ``ExecutionMode.WATCHER``
by @pankajastro in #2047
* Fix ``ExecutionMode.WATCHER`` behaviour with ``DbtTaskGroup`` by
@tatiana in #2044
* Fix Cosmos behaviour when using watcher with
``InvocationMode.DBT_RUNNER`` by @tatiana in #2048

* Add Airflow 3 plugin for dbt docs with multiple dbt projects support
by @pankajkoti in #2009, check the
[documentation](https://astronomer.github.io/astronomer-cosmos/configuration/hosting-docs.html).
* Initial support to ``dbt Fusion`` by @tatiana in #1803. More details
[here](https://astronomer.github.io/astronomer-cosmos/configuration/dbt-fusion).
* Support to prune sources without downstream references in dbt projects
by @corsettigyg in #1988
* Allow to set task display name as a user-defined function by
@corsettigyg in #1761
* Add dbt project's hash to dag docs to support dag versioning in
Airflow 3 by @pankajkoti in #1907
* feat: Add Jinja templating support for ``dbt_cmd_flags`` by
@skillicinski in #1899
* Add Scarf metric to collect the execution mode uses by @pankajastro in
#1981
* Support Airflow 3.1 by @tatiana in #1980
* Add MySQL profile mapping by @Lee2532 in #1977
* Add sqlserver profile mapping by @pankajastro in #1737

**Enhancement**

* Use XCom to store sql when using ``ExecutionMode.AIRFLOW_ASYNC`` by
@pankajastro in #1934
* Refactor ``AIRFLOW_ASYNC`` teardown so it doesn't install the
virtualenv by @pankajastro in #1938
* Reuse the virtual env for ``AIRFLOW_ASYNC`` setup task by @pankajastro
in #1939
* Improve dataset/asset experience in Cosmos by @tatiana in #2030
* Add ``downstreams`` to ``DbtNode`` by @wornjs in #2028

**Bug fixes**

* Fix tags extraction by @ms32035 in #1915
* Fix task flow operator args by @anyapriya in #2024

**Documentation**

* Add documentation for Airflow 3 Plugin supporting dbt docs for
multiple dbt projects by @pankajkoti in #2063
* Add Cosmos Deferrable Operator Guide by @pankajastro in #1922
* Add dbt Fusion documentation by @tatiana in #1824 #1830
* Update dbt-fusion.rst to explicitly highlight it is in alpha by
@tatiana in #1838
* Fix a bunch of docs build errors and warnings by @pankajkoti in
#1886
* Add docs note for param virtualenv_dir for async execution mode by
@pankajastro in #1969
* Use pepy.tech downloads badge in README by @pankajkoti in #1920
* Correct the default value of ``cache_dir`` by @seokyun.ha in #2027

**Others**

* Promote @corsettigyg to committer by @tatiana in #1985
* Add @pankajkoti and @pankajastro to ``contributors.rst`` by @tatiana
in #1983
* Update setup script for airflow3 script by @dwreeves in #2023
* Prevent pytest from trying to test classes that aren't actually tests
by @anyapriya in #2032
* Fix ``dag.test()`` for Airflow 3.1+ by syncing DAG to database bby
@kaxil in #2037
* Disable Scarf in CI by @pankajastro in #2016
* Fix failing dbt Fusion tests when run in parallel in CI by @pankajkoti
in #1896
* Fix MyPy issues related to ``ObjectStoragePath`` in main branch by
@tatiana in #2012
* Cleanup example dbt event JSON dictionaries kept for XCOM referencby
@pankajkoti in #1997
* Bump min hatch version that includes fixes for click>=8.3.0 by
@pankajkoti in #1996
* Use official postgres image from Docker hub for kubernetes setup by
@pankajkoti in #1986
* Use click<8.3.0 for hatch as click 8.3 breaks hatch by @pankajkoti in
#1987
* Pin Airflow version in type check CI job by @pankajastro in #2003
* Improve comments after feedback on #1948 by @tatiana in #1963
* Fix running tests with dbt Fusion 2.0.0 preview versions by @tatiana
in #1948
* Test hardening of dbt node having tags as unset or missing by
@pankajkoti in #1918
* Fix Sphinx issue in the main branch by @tatiana in #2064
* pre-commit autoupdate in #2065, #2043, #2033, #2019, #1990, #2019,
#2008, #1941, #1935, #1924
* GitHub dependabot update in #2051, #2050, #2038, #2022, #1947, #1955,
#1946, #1944, #1945, #1928, #1921, #1917


Co-authored-by: Pankaj Koti <pankaj.koti@astronomer.io>
Co-authored-by: Pankaj Singh <pankaj.singh@astronomer.io>
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants