Release 1.10.1#1774
Merged
Merged
Conversation
Fix rendering for [use_dataset_airflow3_uri_standard](https://astronomer.github.io/astronomer-cosmos/configuration/cosmos-conf.html#id2) (cherry picked from commit ff55436)
<!--pre-commit.ci start--> updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.7 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.7...v0.11.8) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 705d74d)
…radation (#1735) There is an apparent degradation in the speed when running our integration tests in AF3 compared to AF2. We're re-enabling these metrics so that we can analyse them. As an example, during the run https://github.com/astronomer/astronomer-cosmos/actions/runs/14774756080: - Run-Integration-Tests(3.9,2.10,1.9) took 12m 40s - Run-Integration-Tests(3.9,3.0,1.9) took 41m 6s (cherry picked from commit a8795f3)
closes: #1703 Previously, we sent the dag_hash to Scarf, but with the removal of DagRun.dag_hash in Airflow 3, this PR modifies the implementation to send a portion of the dag_id hash instead. This change is a reasonable compromise, as we only need a unique identifier for the DAG. Additionally, the PR updates the relevant test cases to reflect this change. --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com> (cherry picked from commit a532a20)
…#1738) ## Description This PR fixes an issue where using `operator_args={'full_refresh': True}` with `AIRFLOW_ASYNC` execution mode would cause an error. The fix ensures that: 1. The `full_refresh` parameter is properly passed from `DbtRunAirflowAsyncOperator` to underlying operators. 2. The `--full-refresh` flag is added to the dbt command during the setup async task. This allows users to properly use the `full_refresh` parameter with async execution mode, ensuring that models are rebuilt from scratch when needed. ## Related Issue(s) Closes #1736 Closes #1610 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 7510fd8)
A minor change to the custom callback example in the docs here: https://astronomer.github.io/astronomer-cosmos/configuration/callbacks.html#custom-callbacks: 1. The `run_results.json` artifact is in the target directory, not the project directory. 2. The results (at least in version `dbt-core=1.9.4`) is a list in a `results` node, not a value in the root object. (cherry picked from commit 21a2f13)
We were incorrectly logging that we were using dbtRunner even if we were using subprocess because we had duplicated logic to decide what was being used to log and to execute the actual code. Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> (cherry picked from commit 73f123b)
(cherry picked from commit 910f065)
<!--pre-commit.ci start--> updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.8 → v0.11.9](astral-sh/ruff-pre-commit@v0.11.8...v0.11.9) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 2452f2e)
…ing local directory (#1740) Ensure remote target directory are created when copying files when using local directory. When configuring a remote target directory that points to a local path while using AIRFLOW ASYNC, like so: ```bash AIRFLOW__COSMOS__REMOTE_TARGET_PATH=/usr/local/airflow/cosmos AIRFLOW__COSMOS__REMOTE_TARGET_PATH_CONN_ID=file_default ``` We might face this issue: ```bash FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/airflow/cosmos/simple_dag_async__dbt_async/run/jaffle_shop/models/example/my_second_dbt_model.sql' ``` Closes #1739 Co-authored-by: Giovanni Corsetti <155465603+corsettigyg@users.noreply.github.com> (cherry picked from commit a712c2d)
The feature introduced in #1670 (Support running `dbt deps` incrementally to pre-defined `dbt_packages` during task execution) did not work as expected if users had defined a custom path for `packages-install-path`. It only worked if the default (`dbt_packages` was being used. This PR aims to solve the issue. (cherry picked from commit 62b6ddc)
<!--pre-commit.ci start--> updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.9 → v0.11.10](astral-sh/ruff-pre-commit@v0.11.9...v0.11.10) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 584f1f2)
…DEBUG` (#1764) Recently, there have been some concerns that Cosmos may modify the `packages.yml` content, leading to errors. If users set `AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`, they should now be able to confirm the content of the file. Example of log output: ``` [2025-05-12T10:52:38.183+0100] {local.py:481} DEBUG - Checking for the packages.yml dependencies file. [2025-05-12T10:52:38.184+0100] {local.py:484} DEBUG - Contents of the </var/folders/td/522y78v91d1f5wgh67mj3p0m0000gn/T/tmp_4q53rv2/packages.yml> dependencies file: packages: - package: dbt-labs/dbt_utils version: "1.1.1" ``` (cherry picked from commit 81e248a)
…ner (#1760) This PR adds support for conditionally applying the `--no-static-parser` dbt flag in Cosmos operators, ensuring it is included only when InvocationMode.DBT_RUNNER is used during task execution. **Static Parser Issue**: User reports and investigation revealed that, starting with Cosmos 1.9.0 (see PR #1484), using dbtRunner for both DAG parsing and task execution in Airflow 2.x can cause task hangs. This is due to dbt's static parser interacting poorly with Cosmos's use of temporary project directories, especially when the temp paths differ between parsing and execution. **Workaround**: Adding the `--no-static-parser` flag when invoking dbtRunner during task execution avoids these hangs and ensures reliable operation. This flag is not needed (and should not be added) when using the subprocess invocation mode. closes: #1751 related: #1750 Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com> (cherry picked from commit 15a8d91)
Our CI tests have been quite unstable lately; this PR aims to fix the most recent issues. (cherry picked from commit eb8114a)
…is specific per DAG run (#1741) Refactor `AIRFLOW_ASYNC` so that the path in the remote object store is specific per DAG run. The format of remote model path will be: ```python # test_cosmos/simple_dag_async/run/jaffle_shop/models/example/my_first_dbt_model.sql remote_model_path = f"{remote_target_path_str}/{dbt_dag_task_group_identifier}/{run_id}/run/{relative_file_path}" ``` Closes #1613 (cherry picked from commit 304e426)
This PR introduces a new configuration flag `enable_memory_optimised_imports` under the `cosmos` Airflow config section (environment variable `AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS`) to optimise memory usage when Cosmos is installed but not actively used or when only certain modules of Cosmos need to be used (achieved by importing them explicitly with their full module names). ## Changes made to accommodate the above - Introduce `enable_memory_optimised_imports` in `cosmos/settings.py` and guard eager imports in `__init__.py`. - Extract provider info into `cosmos/provider_info.py` and update entry-points. ## Problem When Cosmos is installed, it eagerly imports many classes and modules (e.g., `DbtDag`, `operators`, etc) in `__init__.py`, leading to increased memory usage—observed to be approximately 200MB per task per worker node even if Cosmos isn’t actively used. ## Proposed Solution By default, `enable_memory_optimised_imports` is set to `False`, preserving the current behaviour and maintaining backward compatibility (i.e., all top-level exports remain available). When `explicit_imports` is set to `True`, top-level imports such as `DbtDag` are no longer automatically exposed via `cosmos.__init__.py`. This prevents the loading of large modules unless explicitly imported, resulting in reduced memory usage. In Cosmos 2.0, this will become the default behaviour, and we'll remove the existing behaviour of allowing users to import everything from cosmos (`__init__.py`) as mentioned in #1213 ## Usage To enable optimised imports: ``` export AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS=True ``` ## Memory footprint analysis ### Non-Cosmos DAG The following experiment was conducted on an Astro deployment that had only a single non-Cosmos DAG (DAG with 2 simple BashOperator tasks echoing outputs) running, with the `astronomer-cosmos` package installed in the deployment. **Memory usage with default approach of `enable_memory_optimised_imports` config disabled ~900MB** <img width="1153" alt="Screenshot 2025-05-20 at 1 20 07 AM" src="https://github.com/user-attachments/assets/ffc8d99d-d953-45de-9209-479654523df0" /> **Memory usage with `enable_memory_optimised_imports` config enabled ~700MB** <img width="1343" alt="Screenshot 2025-05-20 at 1 20 22 AM" src="https://github.com/user-attachments/assets/4ac6cb1b-ffb6-4c74-aa97-8db28dc60556" /> ### Cosmos DAG The following experiment was conducted on an Astro deployment that had the below Cosmos DAG running a jaffle-shop dbt project DAG Code: ``` from datetime import datetime from cosmos.airflow.dag import DbtDag from cosmos.config import ProjectConfig, RenderConfig from cosmos.constants import LoadMode, InvocationMode, TestBehavior from include.profiles import snowflake_db from include.constants import jaffle_shop_path, venv_execution_config simple_dag = DbtDag( project_config=ProjectConfig(jaffle_shop_path), profile_config=snowflake_db, execution_config=venv_execution_config, render_config=RenderConfig( test_behavior=TestBehavior.NONE, ), schedule=None, start_date=datetime(2023, 1, 1), catchup=False, dag_id="simple_dag", tags=["simple"], default_args={ "retries": 2, }, ) ``` where below are the values for imported constants in the above DAG ``` jaffle_shop_path = Path("/usr/local/airflow/dbt/jaffle_shop") dbt_executable = Path("/usr/local/airflow/dbt_venv/bin/dbt") venv_execution_config = ExecutionConfig(dbt_executable_path=str(dbt_executable)) ``` **Memory usage with default approach of `enable_memory_optimised_imports` config disabled.** It was observed that when **DAGs are running the memory usage peaks to 1.8-2.0GB and when no DAGs are running (idle worker), the memory usage hovered around ~990 MB** <img width="1482" alt="Screenshot 2025-05-21 at 5 20 13 PM" src="https://github.com/user-attachments/assets/867e162d-7a58-455a-a232-3716c5c03e31" /> **Memory usage with `enable_memory_optimised_imports` config enabled** It was observed that for **the first DAG run the memory usage peaked upto 1.6 GB but for subsequent DAG runs the memory usage hovered around ~780 MB. This memory usage of ~780 MB remained consistent when DAGs were run (I gave about 5 subsequent DAG runs one after the other) or the worker was idle.** <img width="1313" alt="Screenshot 2025-05-21 at 5 18 38 PM" src="https://github.com/user-attachments/assets/e3f425d7-137b-4441-a8ba-d4adb587a862" /> This change thus provides users with more control over Cosmos’s memory footprint with leveraging the optional config. closes: #1652 related: #1213 related: #1471 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com> (cherry picked from commit 633fcf3)
pankajastro
approved these changes
May 21, 2025
tatiana
reviewed
May 22, 2025
Collaborator
tatiana
left a comment
There was a problem hiding this comment.
Thanks a lot for leading this release and making sure everything worked smoothly, @pankajkoti , excellent work!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug Fixes
full_refreshparameter inAIRFLOW_ASYNCExecutionConfigmode by @tuantran0910 in Fixfull_refreshparameter inAIRFLOW_ASYNCexecution config mode #1738packages-install-pathby @tatiana in Support custompackages-install-path#1768ExecutionMode.LOCALto leverageProjectConfig.manifest_pathby @tatiana in FixExecutionMode.LOCALto leverageProjectConfig.manifest_path#1772AIRFLOW_ASYNCso that the path in the remote object store is specific per DAG run by @tuantran0910 in RefactorAIRFLOW_ASYNCso that the path in the remote object store is specific per DAG run #1741Documentation
use_dataset_airflow3_uri_standardby @pankajastro in Fix the docs rendering #1742Others
packages.ymlwhenAIRFLOW__LOGGING__LOGGING_LEVEL=DEBUGby @tatiana in Log contents ofpackages.ymlwhenAIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG#1764