Skip to content

Support running dbt deps incrementally to pre-defined dbt_packages during task execution#1670

Merged
tatiana merged 10 commits into
mainfrom
issue-1630-task-execution
Apr 16, 2025
Merged

Support running dbt deps incrementally to pre-defined dbt_packages during task execution#1670
tatiana merged 10 commits into
mainfrom
issue-1630-task-execution

Conversation

@tatiana
Copy link
Copy Markdown
Collaborator

@tatiana tatiana commented Apr 16, 2025

Motivation

Support running dbt deps incrementally to pre-calculated dbt_packages during the task execution. This was a use case requested by an Astro customer.

Before this change, Cosmos supported two types of configuration:

An Astronomer customer requested to reuse the defined initially dbt_packages directory and run dbt deps (incrementally).

Implementation

We do not run dbt commands directly in the original dbt project folder with Cosmos because some users use read-only filesystems (#414). We also decided to use symbolic links instead of copying the directory due to performance issues (#488). Since we did not want to introduce a breaking change in a minor Cosmos release by changing the existing Cosmos 1.x behaviour to meet this new use case, this PR supports:

  • Copying the dbt deps related files (dbt packages folder and symbolic link) to the Cosmos temporary folder; and
  • Running dbt deps.

So this is not a breaking change, users must opt into this behaviour by using ProjectConfig.copy_dbt_packages=True (new configuration) and operator_args={"install_dbt_deps": True} or ProjectConfig. install_dbt_deps =True and one of the following:

  • Changing the operator to receive the argument copy_dbt_packages=True
  • Changing individual DbtDag or DbtTaskGroup instances to use the new configuration ProjectConfig.copy_dbt_packages=True
  • Changing the behaviour globally, by setting the Airflow configuration either via the environment variable AIRFLOW__COSMOS__DEFAULT_COPY_DBT_PACKAGES=True or via the airflow.cfg:
[cosmos]
default_copy_dbt_packages=True

How this was tested

To validate the end-to-end behaviour, we run the following dag from dev/dags:

airflow dags test dbt_deps_example

Related tickets

This is a follow-up to #1668 and #1669.

I'll make a follow-up PR covering the documentation.

Closes: #1630

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 16, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 43450bf
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/67ffc2c9a7d5d40008436027

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 16, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 43450bf
Status: ✅  Deploy successful!
Preview URL: https://e3420e72.astronomer-cosmos.pages.dev
Branch Preview URL: https://issue-1630-task-execution.astronomer-cosmos.pages.dev

View logs

@tatiana tatiana force-pushed the issue-1630-task-execution branch from 35b8850 to f8c2671 Compare April 16, 2025 14:33
@tatiana tatiana marked this pull request as ready for review April 16, 2025 14:42
Copilot AI review requested due to automatic review settings April 16, 2025 14:42
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 16, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for running dbt deps incrementally using a pre-defined dbt_packages directory by introducing a new configuration flag, copy_dbt_packages. Key changes include:

  • Updating the operator to conditionally copy dbt_packages and adjust symbolic link creation based on the new configuration.
  • Modifying the converter to pass the new copy_dbt_packages flag.
  • Adjusting the dbt graph loading mode mapping for DBT_LS_CACHE.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
tests/operators/test_local.py Added a test for _clone_project verifying symbolic link and copy paths.
cosmos/operators/local.py Updated operator logic to support conditional copying and logging of dbt packages.
cosmos/dbt/graph.py Changed mapping for DBT_LS_CACHE mode to use load_via_dbt_ls.
cosmos/converter.py Added configuration override for the copy_dbt_packages flag.
Comments suppressed due to low confidence (2)

tests/operators/test_local.py:1531

  • Consider adding a test case for _clone_project where copy_dbt_packages is false to ensure that symlink creation behaves as expected in that branch.
@patch("cosmos.operators.local.copy_dbt_packages")

cosmos/dbt/graph.py:531

  • Mapping DBT_LS_CACHE to load_via_dbt_ls rather than load_via_dbt_ls_cache could lead to unintended behavior if the two loaders differ; double-check that this change is intentional and meets the expected functionality.
LoadMode.DBT_LS_CACHE: self.load_via_dbt_ls,

@dosubot dosubot Bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment labels Apr 16, 2025
Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/graph.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.09%. Comparing base (5711580) to head (43450bf).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1670   +/-   ##
=======================================
  Coverage   97.08%   97.09%           
=======================================
  Files          80       80           
  Lines        5014     5022    +8     
=======================================
+ Hits         4868     4876    +8     
  Misses        146      146           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tatiana tatiana deleted the issue-1630-task-execution branch April 16, 2025 15:39
@tatiana tatiana mentioned this pull request Apr 16, 2025
tatiana added a commit that referenced this pull request Apr 17, 2025
Rename user-facing global configuration
`default_copy_dbt_packages_value` to `default_copy_dbt_packages`.

This is a follow-up to #1668, #1669 and #1670.
@tatiana tatiana added this to the Cosmos 1.10.0 milestone Apr 17, 2025
tatiana added a commit that referenced this pull request May 1, 2025
Features

* Airflow 3 support
* Support running ``dbt deps`` incrementally to pre-defined
``dbt_packages`` by @tatiana in #1668 and #1670
* Add ``DuckDB`` profile mapping by @prithvijitguha and @pankajastro in
#1553
* Implement DBT exposure selector by ghjklw #1717

Bug Fixes

* Fix ``test_indirect_selection`` flag to be propagated in case of
``TestBehavior.BUILD`` by @corsettigyg in #1663
* Fix ``select`` clause in the case of detached tests by @anyapriya in
#1680
* Operator argument fixes by @johnhoran in #1648


Airflow 3 Support

* Support rendering DbtDag in Airflow 3 by @tatiana and @ashb in #1657
* Refactor Rendered Task Instance Fields (RTIF) handling for Airflow 2.x
and 3.x by @pankajkoti in #1661
* Run cosmos operator in Airflow 3 by @pankajastro in #1642
* Fix ``python_virtualenv.prepare_env`` top-level import for Airflow 3
by @pankajkoti in #1678
* Fix Variable not found issue in Airflow 3 by @tatiana in #1684
* Disable CosmosPlugin on Airflow 3 setup by @pankajkoti in #1692, #1698
* Use ``schedule`` param in example DAGs instead of the 2.10 deprecated
and 3.0 removed ``schedule_interval`` by @pankajkoti in #1701
* Ensure ``virtualenv_dir`` path exists by @pankajkoti in #1724
* Support emitting Assets with Airflow 3 by @tatiana in #1713
* Add docs on Airflow 3 compatibility by @pankajkoti and @tatiana in
#1731
* Introduce, test and document asset/dataset breaking change by @tatiana
in #1672
* Improve dataset/asset driven scheduling documentation by @tatiana in
#1729

Enhancements

* Allow multiple callbacks by @corsettigyg #1693
* Refactor kubernetes warning callback handling by @canbekley in #1681

Documentation

* Add documentation related to ``copy_dbt_packages`` by @tatiana in
#1671
* Make wording and command consistent in the contributing doc by
@pankajkoti in #1697
* Add MonteCarlo callback example for importing dbt artifacts by
@corsettigyg #1695
* Change async feature to be non-experimental by @tatiana in #1732

Others

* Add sample ``dbt_packages`` to validate incremental ``dbt deps`` by
@tatiana in #1669
* Add kubernetes execution mode example in Airflow 3 by @pankajastro in
#1667
* Check only major version until Airflow 3 stable release by
@pankajastro in #1665
* Install Airflow from main branch by @pankajastro in #1660
* Add dev tool for Airflow 3 by @pankajastro and @tatiana in #1627
* Improve Airflow 3 tooling by @pankajastro in #1656
* Skip associating ``openlineage_events_completes`` to ``ti`` in Airflow
3 by @pankajkoti in #1662
* Add .gitignore file for the scripts/airflow3 directory by @pankajkoti
in #1658
* Remove ``original_jaffle_shop`` dbt project by @pankajkoti in #1676
* Fix or ignore type check error by @pankajastro in #1687
* Run virtualenv example with Airflow 3 tooling by @pankajastro in #1686
* Enable running setup/teardown tasks with Async execution DAG with
Airflow 3 tooling by @pankajastro in #1696
* Enable integration tests for the DuckDB adapter by @pankajastro in
#1699
* Add Airflow 3 tests matrix entries in CI by @pankajkoti in #1646
* Use a different way to get tasks count for asserting test_perf_dag by
@pankajkoti in #1714
* Reinstall Airflow 3 dependency on ``pydantic>=2.11`` for dbt adapter
versions 1.6 & 1.9 by @pankajkoti in #1715
* Fix outdated ``echo`` in Airflow 3 tooling script #1700
* Add files not needed for git tracking to .gitignore by @pankajkoti in
#1723
* Use latest minor versions for dbt adapters to get in compatibility
fixes by @pankajkoti in #1719
* Fix Airflow 3 tests raising generate_run_id() takes 0 positional
arguments by @tatiana in #1725
* Fix dataset tests failing in Airflow 3 by @tatiana in #1716
* Enable example DAGs to run in CI that were disabled in PR #1646 by
@pankajkoti in #1726
* Pre-commit updates: #1666, #1653, #1641, #1682, #1720


Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Co-authored-by: Pankaj Singh
<98807258+pankajastro@users.noreply.github.com>

---------

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
tatiana added a commit that referenced this pull request May 19, 2025
The feature introduced in #1670 (Support running `dbt deps` incrementally to pre-defined `dbt_packages` during task execution) did not work as expected if users had defined a custom path for `packages-install-path`. It only worked if the default (`dbt_packages` was being used. This PR aims to solve the issue.
tatiana added a commit that referenced this pull request May 19, 2025
The feature introduced in #1670 (Support running `dbt deps`
incrementally to pre-defined `dbt_packages` during task execution) did
not work as expected if users had defined a custom path for
`packages-install-path`. It only worked if the default (`dbt_packages`
was being used. This PR aims to solve the issue.
pankajkoti pushed a commit that referenced this pull request May 20, 2025
The feature introduced in #1670 (Support running `dbt deps`
incrementally to pre-defined `dbt_packages` during task execution) did
not work as expected if users had defined a custom path for
`packages-install-path`. It only worked if the default (`dbt_packages`
was being used. This PR aims to solve the issue.
pankajkoti pushed a commit that referenced this pull request May 21, 2025
The feature introduced in #1670 (Support running `dbt deps`
incrementally to pre-defined `dbt_packages` during task execution) did
not work as expected if users had defined a custom path for
`packages-install-path`. It only worked if the default (`dbt_packages`
was being used. This PR aims to solve the issue.

(cherry picked from commit 62b6ddc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:config Related to configuration, like YAML files, environment variables, or executer configuration dbt:deps Primarily related to dbt deps command or functionality execution:local Related to Local execution environment size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Customer request on dbt deps pre-install + update if needed

3 participants