Conversation
Previously, Cosmos was ignoring the user-defined manifest file while executing tasks. This PR solves the problem. Closes: #1643
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Deploying astronomer-cosmos with
|
| Latest commit: |
3b6d504
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://0c600896.astronomer-cosmos.pages.dev |
| Branch Preview URL: | https://fix-1643.astronomer-cosmos.pages.dev |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces support for copying a user-defined manifest.json when running tasks locally by leveraging ProjectConfig.manifest_path.
- Adds a new
manifest_filepathparameter to local operators and propagates it through the converter - Implements
copy_manifest_file_if_existsin the DBT project module with corresponding unit tests - Updates the local operator clone logic and adds an integration test to verify manifest copying
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/operators/test_local.py | Added integration test to assert the manifest file is copied |
| tests/dbt/test_project.py | Added unit tests for copy_manifest_file_if_exists |
| cosmos/operators/local.py | Introduced manifest_filepath param and invoke manifest copy |
| cosmos/dbt/project.py | Added copy_manifest_file_if_exists function |
| cosmos/converter.py | Passed manifest_path from ProjectConfig into operator args |
Files not reviewed (1)
- docs/configuration/operator-args.rst: Language not supported
Comments suppressed due to low confidence (2)
cosmos/operators/local.py:180
- [nitpick] The
manifest_filepathparameter name differs fromProjectConfig.manifest_path; consider renaming it tomanifest_pathfor consistency across your API.
manifest_filepath: str = "",
cosmos/operators/local.py:469
copy_manifest_file_if_existslogs via the module-level logger, not the operator's logger; to ensure Airflow captures it (and the integration test sees it), log throughself.log.infoor accept a logger parameter.
copy_manifest_file_if_exists(self.manifest_filepath, Path(tmp_dir_path))
ExecutionMode.LOCAL to leverage ProjectConfig.manifest_path
pankajkoti
left a comment
There was a problem hiding this comment.
The implementation looks good to me
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1772 +/- ##
==========================================
+ Coverage 92.91% 97.71% +4.80%
==========================================
Files 84 84
Lines 5262 5262
==========================================
+ Hits 4889 5142 +253
+ Misses 373 120 -253 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @tatiana, can you provide me explanation why previously Cosmos was ignoring the user-defined manifest file while executing tasks ? I have read this PR but still not understand why the old way - defining |
|
@tuantran0910 I believe it was a bug introduced through out the history of the project. The manifest file was originally used in Cosmos for DAG rendering - not intentionally used for task execution Historically, the first implementation of
We attempted to configure different The next implementation consisted in copying the whole original dbt project directory into a temporary folder, from where Cosmos would execute the dbt commands - one temporary folder per task. End users reported issues with high disk utilization and slowness during the full copy. To address this feedback, Cosmos started creating symbolic links from the temporary folder to the original one for some folders (e.g. for This changed a bit over time, when we started supporting partial parsing files, and more recently when we introduced support to incremental dbt deps - but the manifest was still not being copied for task execution - until this PR. This changed slightly |
|
Thank you @tatiana very much for the details explanation. I got it. |
Bug Fixes * Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC`` ``ExecutionConfig`` mode by @tuantran0910 in #1738 * Fix dbt ls invocation method log message by @tatiana and @dstandish in #1749 * Ensure remote target directory is created when copying files when using local directory by @tuantran0910 and @corsettigyg in #1740 * Support custom ``packages-install-path`` by @tatiana in #1768 * Disable dbt static parser during Airflow task execution using dbt runner by @pankajkoti and @tatiana in #1760 * Fix ``ExecutionMode.LOCAL`` to leverage ``ProjectConfig.manifest_path`` by @tatiana in #1772 * Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store is specific per DAG run by @tuantran0910 in #1741 * Optimise memory usage with optional explicit imports by @pankajkoti and @tatiana in #1769 Documentation * Fix documentation rendering for ``use_dataset_airflow3_uri_standard`` by @pankajastro in #1742 * Correct custom callback example by @walter9388 in #1747 Others * Re-enable integration tests durations to troubleshoot performance degradation by @tatiana in #1735 * Run listener tests for Airflow 3 by @pankajastro in #1743 * Add Airflow 3 db files to ignore from git tracking by @pankajkoti in #1755 * Log contents of ``packages.yml`` when ``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764 * Fix Airflow dependencies in the CI by @tatiana in #1773 * Pre-commit updates: #1744, #1765, #1770
Bug Fixes * Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC`` ``ExecutionConfig`` mode by @tuantran0910 in #1738 * Fix dbt ls invocation method log message by @tatiana and @dstandish in #1749 * Ensure remote target directory is created when copying files when using local directory by @tuantran0910 and @corsettigyg in #1740 * Support custom ``packages-install-path`` by @tatiana in #1768 * Disable dbt static parser during Airflow task execution using dbt runner by @pankajkoti and @tatiana in #1760 * Fix ``ExecutionMode.LOCAL`` to leverage ``ProjectConfig.manifest_path`` by @tatiana in #1772 * Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store is specific per DAG run by @tuantran0910 in #1741 * Optimise memory usage with optional explicit imports by @pankajkoti and @tatiana in #1769 Documentation * Fix documentation rendering for ``use_dataset_airflow3_uri_standard`` by @pankajastro in #1742 * Correct custom callback example by @walter9388 in #1747 Others * Re-enable integration tests durations to troubleshoot performance degradation by @tatiana in #1735 * Run listener tests for Airflow 3 by @pankajastro in #1743 * Add Airflow 3 db files to ignore from git tracking by @pankajkoti in #1755 * Log contents of ``packages.yml`` when ``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764 * Fix Airflow dependencies in the CI by @tatiana in #1773 * Pre-commit updates: #1744, #1765, #1770 --------- (cherry picked from commit 430be00)
Previously, Cosmos was ignoring the user-defined manifest file while executing tasks. This PR solves the problem.
Closes: #1643