Support running dbt deps incrementally to pre-defined dbt_packages during task execution#1670
Merged
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Deploying astronomer-cosmos with
|
| Latest commit: |
43450bf
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e3420e72.astronomer-cosmos.pages.dev |
| Branch Preview URL: | https://issue-1630-task-execution.astronomer-cosmos.pages.dev |
35b8850 to
f8c2671
Compare
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for running dbt deps incrementally using a pre-defined dbt_packages directory by introducing a new configuration flag, copy_dbt_packages. Key changes include:
- Updating the operator to conditionally copy dbt_packages and adjust symbolic link creation based on the new configuration.
- Modifying the converter to pass the new copy_dbt_packages flag.
- Adjusting the dbt graph loading mode mapping for DBT_LS_CACHE.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/operators/test_local.py | Added a test for _clone_project verifying symbolic link and copy paths. |
| cosmos/operators/local.py | Updated operator logic to support conditional copying and logging of dbt packages. |
| cosmos/dbt/graph.py | Changed mapping for DBT_LS_CACHE mode to use load_via_dbt_ls. |
| cosmos/converter.py | Added configuration override for the copy_dbt_packages flag. |
Comments suppressed due to low confidence (2)
tests/operators/test_local.py:1531
- Consider adding a test case for _clone_project where copy_dbt_packages is false to ensure that symlink creation behaves as expected in that branch.
@patch("cosmos.operators.local.copy_dbt_packages")
cosmos/dbt/graph.py:531
- Mapping DBT_LS_CACHE to load_via_dbt_ls rather than load_via_dbt_ls_cache could lead to unintended behavior if the two loaders differ; double-check that this change is intentional and meets the expected functionality.
LoadMode.DBT_LS_CACHE: self.load_via_dbt_ls,
tatiana
commented
Apr 16, 2025
pankajkoti
approved these changes
Apr 16, 2025
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1670 +/- ##
=======================================
Coverage 97.08% 97.09%
=======================================
Files 80 80
Lines 5014 5022 +8
=======================================
+ Hits 4868 4876 +8
Misses 146 146 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merged
tatiana
added a commit
that referenced
this pull request
May 1, 2025
Features * Airflow 3 support * Support running ``dbt deps`` incrementally to pre-defined ``dbt_packages`` by @tatiana in #1668 and #1670 * Add ``DuckDB`` profile mapping by @prithvijitguha and @pankajastro in #1553 * Implement DBT exposure selector by ghjklw #1717 Bug Fixes * Fix ``test_indirect_selection`` flag to be propagated in case of ``TestBehavior.BUILD`` by @corsettigyg in #1663 * Fix ``select`` clause in the case of detached tests by @anyapriya in #1680 * Operator argument fixes by @johnhoran in #1648 Airflow 3 Support * Support rendering DbtDag in Airflow 3 by @tatiana and @ashb in #1657 * Refactor Rendered Task Instance Fields (RTIF) handling for Airflow 2.x and 3.x by @pankajkoti in #1661 * Run cosmos operator in Airflow 3 by @pankajastro in #1642 * Fix ``python_virtualenv.prepare_env`` top-level import for Airflow 3 by @pankajkoti in #1678 * Fix Variable not found issue in Airflow 3 by @tatiana in #1684 * Disable CosmosPlugin on Airflow 3 setup by @pankajkoti in #1692, #1698 * Use ``schedule`` param in example DAGs instead of the 2.10 deprecated and 3.0 removed ``schedule_interval`` by @pankajkoti in #1701 * Ensure ``virtualenv_dir`` path exists by @pankajkoti in #1724 * Support emitting Assets with Airflow 3 by @tatiana in #1713 * Add docs on Airflow 3 compatibility by @pankajkoti and @tatiana in #1731 * Introduce, test and document asset/dataset breaking change by @tatiana in #1672 * Improve dataset/asset driven scheduling documentation by @tatiana in #1729 Enhancements * Allow multiple callbacks by @corsettigyg #1693 * Refactor kubernetes warning callback handling by @canbekley in #1681 Documentation * Add documentation related to ``copy_dbt_packages`` by @tatiana in #1671 * Make wording and command consistent in the contributing doc by @pankajkoti in #1697 * Add MonteCarlo callback example for importing dbt artifacts by @corsettigyg #1695 * Change async feature to be non-experimental by @tatiana in #1732 Others * Add sample ``dbt_packages`` to validate incremental ``dbt deps`` by @tatiana in #1669 * Add kubernetes execution mode example in Airflow 3 by @pankajastro in #1667 * Check only major version until Airflow 3 stable release by @pankajastro in #1665 * Install Airflow from main branch by @pankajastro in #1660 * Add dev tool for Airflow 3 by @pankajastro and @tatiana in #1627 * Improve Airflow 3 tooling by @pankajastro in #1656 * Skip associating ``openlineage_events_completes`` to ``ti`` in Airflow 3 by @pankajkoti in #1662 * Add .gitignore file for the scripts/airflow3 directory by @pankajkoti in #1658 * Remove ``original_jaffle_shop`` dbt project by @pankajkoti in #1676 * Fix or ignore type check error by @pankajastro in #1687 * Run virtualenv example with Airflow 3 tooling by @pankajastro in #1686 * Enable running setup/teardown tasks with Async execution DAG with Airflow 3 tooling by @pankajastro in #1696 * Enable integration tests for the DuckDB adapter by @pankajastro in #1699 * Add Airflow 3 tests matrix entries in CI by @pankajkoti in #1646 * Use a different way to get tasks count for asserting test_perf_dag by @pankajkoti in #1714 * Reinstall Airflow 3 dependency on ``pydantic>=2.11`` for dbt adapter versions 1.6 & 1.9 by @pankajkoti in #1715 * Fix outdated ``echo`` in Airflow 3 tooling script #1700 * Add files not needed for git tracking to .gitignore by @pankajkoti in #1723 * Use latest minor versions for dbt adapters to get in compatibility fixes by @pankajkoti in #1719 * Fix Airflow 3 tests raising generate_run_id() takes 0 positional arguments by @tatiana in #1725 * Fix dataset tests failing in Airflow 3 by @tatiana in #1716 * Enable example DAGs to run in CI that were disabled in PR #1646 by @pankajkoti in #1726 * Pre-commit updates: #1666, #1653, #1641, #1682, #1720 Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com> --------- Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
tatiana
added a commit
that referenced
this pull request
May 19, 2025
The feature introduced in #1670 (Support running `dbt deps` incrementally to pre-defined `dbt_packages` during task execution) did not work as expected if users had defined a custom path for `packages-install-path`. It only worked if the default (`dbt_packages` was being used. This PR aims to solve the issue.
tatiana
added a commit
that referenced
this pull request
May 19, 2025
The feature introduced in #1670 (Support running `dbt deps` incrementally to pre-defined `dbt_packages` during task execution) did not work as expected if users had defined a custom path for `packages-install-path`. It only worked if the default (`dbt_packages` was being used. This PR aims to solve the issue.
pankajkoti
pushed a commit
that referenced
this pull request
May 20, 2025
The feature introduced in #1670 (Support running `dbt deps` incrementally to pre-defined `dbt_packages` during task execution) did not work as expected if users had defined a custom path for `packages-install-path`. It only worked if the default (`dbt_packages` was being used. This PR aims to solve the issue.
pankajkoti
pushed a commit
that referenced
this pull request
May 21, 2025
The feature introduced in #1670 (Support running `dbt deps` incrementally to pre-defined `dbt_packages` during task execution) did not work as expected if users had defined a custom path for `packages-install-path`. It only worked if the default (`dbt_packages` was being used. This PR aims to solve the issue. (cherry picked from commit 62b6ddc)
Closed
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Support running
dbt depsincrementally to pre-calculateddbt_packagesduring the task execution. This was a use case requested by an Astro customer.Before this change, Cosmos supported two types of configuration:
operator_args={"install_deps": False}orProjectConfig.install_dbt_deps=False, Cosmos would create a symbolic link for the user's pre-defineddbt_packages(background: Refactor dbt ls to run from tempdir with symbolic links #488, Release 1.2.0 #600, Adding dbt_packages when dbt_deps is False #730)operator_args={"install_deps": True}orProjectConfig.install_dbt_deps=True(default), Cosmos would ignore any user-predefined ' dbt_packagesand do a rundbt deps` from scratch from a temporary folder.An Astronomer customer requested to reuse the defined initially
dbt_packagesdirectory and rundbt deps(incrementally).Implementation
We do not run dbt commands directly in the original dbt project folder with Cosmos because some users use read-only filesystems (#414). We also decided to use symbolic links instead of copying the directory due to performance issues (#488). Since we did not want to introduce a breaking change in a minor Cosmos release by changing the existing Cosmos 1.x behaviour to meet this new use case, this PR supports:
dbt deps.So this is not a breaking change, users must opt into this behaviour by using
ProjectConfig.copy_dbt_packages=True(new configuration) andoperator_args={"install_dbt_deps": True}orProjectConfig. install_dbt_deps =Trueand one of the following:copy_dbt_packages=TrueDbtDagorDbtTaskGroupinstances to use the new configurationProjectConfig.copy_dbt_packages=TrueAIRFLOW__COSMOS__DEFAULT_COPY_DBT_PACKAGES=Trueor via theairflow.cfg:How this was tested
To validate the end-to-end behaviour, we run the following dag from
dev/dags:Related tickets
This is a follow-up to #1668 and #1669.
I'll make a follow-up PR covering the documentation.
Closes: #1630