Skip to content

Fix ExecutionMode.LOCAL to leverage ProjectConfig.manifest_path#1772

Merged
tatiana merged 10 commits into
mainfrom
fix-1643
May 20, 2025
Merged

Fix ExecutionMode.LOCAL to leverage ProjectConfig.manifest_path#1772
tatiana merged 10 commits into
mainfrom
fix-1643

Conversation

@tatiana
Copy link
Copy Markdown
Collaborator

@tatiana tatiana commented May 20, 2025

Previously, Cosmos was ignoring the user-defined manifest file while executing tasks. This PR solves the problem.

Closes: #1643

Previously, Cosmos was ignoring the user-defined manifest file while executing tasks. This PR
solves the problem.

Closes: #1643
@netlify
Copy link
Copy Markdown

netlify Bot commented May 20, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 3b6d504
🔍 Latest deploy log https://app.netlify.com/projects/sunny-pastelito-5ecb04/deploys/682ca2715cb02b0008f5f5a0

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 20, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3b6d504
Status: ✅  Deploy successful!
Preview URL: https://0c600896.astronomer-cosmos.pages.dev
Branch Preview URL: https://fix-1643.astronomer-cosmos.pages.dev

View logs

@tatiana tatiana marked this pull request as ready for review May 20, 2025 13:11
Copilot AI review requested due to automatic review settings May 20, 2025 13:11
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:config Related to configuration, like YAML files, environment variables, or executer configuration area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc execution:local Related to Local execution environment labels May 20, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for copying a user-defined manifest.json when running tasks locally by leveraging ProjectConfig.manifest_path.

  • Adds a new manifest_filepath parameter to local operators and propagates it through the converter
  • Implements copy_manifest_file_if_exists in the DBT project module with corresponding unit tests
  • Updates the local operator clone logic and adds an integration test to verify manifest copying

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/operators/test_local.py Added integration test to assert the manifest file is copied
tests/dbt/test_project.py Added unit tests for copy_manifest_file_if_exists
cosmos/operators/local.py Introduced manifest_filepath param and invoke manifest copy
cosmos/dbt/project.py Added copy_manifest_file_if_exists function
cosmos/converter.py Passed manifest_path from ProjectConfig into operator args
Files not reviewed (1)
  • docs/configuration/operator-args.rst: Language not supported
Comments suppressed due to low confidence (2)

cosmos/operators/local.py:180

  • [nitpick] The manifest_filepath parameter name differs from ProjectConfig.manifest_path; consider renaming it to manifest_path for consistency across your API.
manifest_filepath: str = "",

cosmos/operators/local.py:469

  • copy_manifest_file_if_exists logs via the module-level logger, not the operator's logger; to ensure Airflow captures it (and the integration test sees it), log through self.log.info or accept a logger parameter.
copy_manifest_file_if_exists(self.manifest_filepath, Path(tmp_dir_path))

Comment thread cosmos/dbt/project.py
Comment thread cosmos/dbt/project.py
Comment thread cosmos/operators/local.py
@tatiana tatiana changed the title Fix ExecutionMode.LOCAL to leverage ProjectConfig.manifest_path Fix ExecutionMode.LOCAL to leverage ProjectConfig.manifest_path May 20, 2025
@tatiana tatiana added the parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing label May 20, 2025
@tatiana tatiana added this to the Cosmos 1.10.1 milestone May 20, 2025
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 20, 2025
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.71%. Comparing base (eb8114a) to head (3b6d504).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1772      +/-   ##
==========================================
+ Coverage   92.91%   97.71%   +4.80%     
==========================================
  Files          84       84              
  Lines        5262     5262              
==========================================
+ Hits         4889     5142     +253     
+ Misses        373      120     -253     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 20, 2025
@tatiana tatiana merged commit 544aec9 into main May 20, 2025
90 checks passed
@tatiana tatiana deleted the fix-1643 branch May 20, 2025 15:42
@tuantran0910
Copy link
Copy Markdown
Contributor

Hi @tatiana, can you provide me explanation why previously Cosmos was ignoring the user-defined manifest file while executing tasks ? I have read this PR but still not understand why the old way - defining manifest_file_path in ProjectConfig not works ?

@tatiana
Copy link
Copy Markdown
Collaborator Author

tatiana commented May 21, 2025

@tuantran0910 I believe it was a bug introduced through out the history of the project.

The manifest file was originally used in Cosmos for DAG rendering - not intentionally used for task execution

Historically, the first implementation of LoadMode.LOCAL would attempt to run dbt commands from the original dbt folder. However, this led to two issues:

  • it didn't work when the filesystem was read-only
  • it sometimes failed when running concurrent dbt commands that referenced the same target folder.

We attempted to configure different target & logs folders per Cosmos worker process, but there was an inconsistent way to accomplish this in different versions of dbt-core (flags, environment variable, in some cases changing the dbt_project.yml itself) - and we noticed dbt sometimes generated files outside of these two directories, including dbt_packages - which can only be customised via the dbt_project.yml'. We did not want to change the end-users dbt_project.yml`.

The next implementation consisted in copying the whole original dbt project directory into a temporary folder, from where Cosmos would execute the dbt commands - one temporary folder per task. End users reported issues with high disk utilization and slowness during the full copy.

To address this feedback, Cosmos started creating symbolic links from the temporary folder to the original one for some folders (e.g. for seeds& models folders), and would create an empty dedicated target folder per temporary folder.

This changed a bit over time, when we started supporting partial parsing files, and more recently when we introduced support to incremental dbt deps - but the manifest was still not being copied for task execution - until this PR.

This changed slightly

@tuantran0910
Copy link
Copy Markdown
Contributor

Thank you @tatiana very much for the details explanation. I got it.

pankajkoti pushed a commit that referenced this pull request May 21, 2025
…1772)

Previously, Cosmos was ignoring the user-defined manifest file while
executing tasks. This PR solves the problem.

Closes: #1643
(cherry picked from commit 544aec9)
pankajkoti added a commit that referenced this pull request May 21, 2025
Bug Fixes

* Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC``
``ExecutionConfig`` mode by @tuantran0910 in #1738
* Fix dbt ls invocation method log message by @tatiana and @dstandish in
#1749
* Ensure remote target directory is created when copying files when
using local directory by @tuantran0910 and @corsettigyg in #1740
* Support custom ``packages-install-path`` by @tatiana in #1768
* Disable dbt static parser during Airflow task execution using dbt
runner by @pankajkoti and @tatiana in #1760
* Fix ``ExecutionMode.LOCAL`` to leverage
``ProjectConfig.manifest_path`` by @tatiana in #1772
* Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store
is specific per DAG run by @tuantran0910 in #1741
* Optimise memory usage with optional explicit imports by @pankajkoti
and @tatiana in #1769

Documentation

* Fix documentation rendering for ``use_dataset_airflow3_uri_standard``
by @pankajastro in #1742
* Correct custom callback example by @walter9388 in #1747

Others

* Re-enable integration tests durations to troubleshoot performance
degradation by @tatiana in #1735
* Run listener tests for Airflow 3 by @pankajastro in #1743
* Add Airflow 3 db files to ignore from git tracking by @pankajkoti in
#1755
* Log contents of ``packages.yml`` when
``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764
* Fix Airflow dependencies in the CI by @tatiana in #1773
* Pre-commit updates: #1744, #1765, #1770
pankajkoti added a commit that referenced this pull request May 21, 2025
Bug Fixes

* Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC``
``ExecutionConfig`` mode by @tuantran0910 in #1738
* Fix dbt ls invocation method log message by @tatiana and @dstandish in
#1749
* Ensure remote target directory is created when copying files when
using local directory by @tuantran0910 and @corsettigyg in #1740
* Support custom ``packages-install-path`` by @tatiana in #1768
* Disable dbt static parser during Airflow task execution using dbt
runner by @pankajkoti and @tatiana in #1760
* Fix ``ExecutionMode.LOCAL`` to leverage
``ProjectConfig.manifest_path`` by @tatiana in #1772
* Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store
is specific per DAG run by @tuantran0910 in #1741
* Optimise memory usage with optional explicit imports by @pankajkoti
and @tatiana in #1769

Documentation

* Fix documentation rendering for ``use_dataset_airflow3_uri_standard``
by @pankajastro in #1742
* Correct custom callback example by @walter9388 in #1747

Others

* Re-enable integration tests durations to troubleshoot performance
degradation by @tatiana in #1735
* Run listener tests for Airflow 3 by @pankajastro in #1743
* Add Airflow 3 db files to ignore from git tracking by @pankajkoti in
#1755
* Log contents of ``packages.yml`` when
``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764
* Fix Airflow dependencies in the CI by @tatiana in #1773
* Pre-commit updates: #1744, #1765, #1770


---------

(cherry picked from commit 430be00)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:config Related to configuration, like YAML files, environment variables, or executer configuration area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc execution:local Related to Local execution environment parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: ExecutionMode.LOCAL task leveraging the manifest to run task given in the ProjectConfig.manifest_path

4 participants