Skip to content

Release 1.10.1#1774

Merged
pankajkoti merged 19 commits into
release-1.10from
release-1.10.1
May 21, 2025
Merged

Release 1.10.1#1774
pankajkoti merged 19 commits into
release-1.10from
release-1.10.1

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

Bug Fixes

Documentation

Others

pankajastro and others added 19 commits May 21, 2025 17:51
Fix rendering for
[use_dataset_airflow3_uri_standard](https://astronomer.github.io/astronomer-cosmos/configuration/cosmos-conf.html#id2)

(cherry picked from commit ff55436)
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.7 →
v0.11.8](astral-sh/ruff-pre-commit@v0.11.7...v0.11.8)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit 705d74d)
…radation (#1735)

There is an apparent degradation in the speed when running our
integration tests in AF3 compared to AF2.

We're re-enabling these metrics so that we can analyse them.

As an example, during the run
https://github.com/astronomer/astronomer-cosmos/actions/runs/14774756080:
- Run-Integration-Tests(3.9,2.10,1.9) took 12m 40s
- Run-Integration-Tests(3.9,3.0,1.9) took 41m 6s

(cherry picked from commit a8795f3)
closes: #1703

Previously, we sent the dag_hash to Scarf,
but with the removal of DagRun.dag_hash in Airflow 3,
this PR modifies the implementation to send a portion
of the dag_id hash instead. This change is a reasonable
compromise, as we only need a unique identifier for the DAG.
Additionally, the PR updates the relevant test cases to reflect this
change.

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
(cherry picked from commit a532a20)
…#1738)

## Description

This PR fixes an issue where using `operator_args={'full_refresh':
True}` with `AIRFLOW_ASYNC` execution mode would cause an error. The fix
ensures that:
1. The `full_refresh` parameter is properly passed from
`DbtRunAirflowAsyncOperator` to underlying operators.
2. The `--full-refresh` flag is added to the dbt command during the
setup async task.

This allows users to properly use the `full_refresh` parameter with
async execution mode, ensuring that models are rebuilt from scratch when
needed.

## Related Issue(s)

Closes #1736
Closes #1610

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit 7510fd8)
A minor change to the custom callback example in the docs here:
https://astronomer.github.io/astronomer-cosmos/configuration/callbacks.html#custom-callbacks:

1. The `run_results.json` artifact is in the target directory, not the
project directory.
2. The results (at least in version `dbt-core=1.9.4`) is a list in a
`results` node, not a value in the root object.

(cherry picked from commit 21a2f13)
We were incorrectly logging that we were using dbtRunner even if we were
using subprocess because we had duplicated logic to decide what was
being used to log and to execute the actual code.

Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com>
(cherry picked from commit 73f123b)
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.8 →
v0.11.9](astral-sh/ruff-pre-commit@v0.11.8...v0.11.9)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit 2452f2e)
…ing local directory (#1740)

Ensure remote target directory are created when copying files when using
local directory.

When configuring a remote target directory that points to a local path
while using AIRFLOW ASYNC, like so:

```bash
AIRFLOW__COSMOS__REMOTE_TARGET_PATH=/usr/local/airflow/cosmos
AIRFLOW__COSMOS__REMOTE_TARGET_PATH_CONN_ID=file_default
```

We might face this issue:

```bash
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/airflow/cosmos/simple_dag_async__dbt_async/run/jaffle_shop/models/example/my_second_dbt_model.sql'
```

Closes #1739

Co-authored-by: Giovanni Corsetti <155465603+corsettigyg@users.noreply.github.com>
(cherry picked from commit a712c2d)
The feature introduced in #1670 (Support running `dbt deps`
incrementally to pre-defined `dbt_packages` during task execution) did
not work as expected if users had defined a custom path for
`packages-install-path`. It only worked if the default (`dbt_packages`
was being used. This PR aims to solve the issue.

(cherry picked from commit 62b6ddc)
<!--pre-commit.ci start-->
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.9 →
v0.11.10](astral-sh/ruff-pre-commit@v0.11.9...v0.11.10)
<!--pre-commit.ci end-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit 584f1f2)
…DEBUG` (#1764)

Recently, there have been some concerns that Cosmos may modify the
`packages.yml` content, leading to errors.

If users set `AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`, they should now be
able to confirm the content of the file.

Example of log output:

```
[2025-05-12T10:52:38.183+0100] {local.py:481} DEBUG - Checking for the packages.yml dependencies file.
[2025-05-12T10:52:38.184+0100] {local.py:484} DEBUG - Contents of the </var/folders/td/522y78v91d1f5wgh67mj3p0m0000gn/T/tmp_4q53rv2/packages.yml> dependencies file:
packages:
  - package: dbt-labs/dbt_utils
    version: "1.1.1"

```

(cherry picked from commit 81e248a)
…ner (#1760)

This PR adds support for conditionally applying the `--no-static-parser`
dbt flag in Cosmos operators, ensuring it is included only when
InvocationMode.DBT_RUNNER is used during task execution.

**Static Parser Issue**: User reports and investigation revealed that,
starting with Cosmos 1.9.0 (see PR #1484), using dbtRunner for both DAG
parsing and task execution in Airflow 2.x can cause task hangs. This is
due to dbt's static parser interacting poorly with Cosmos's use of
temporary project directories, especially when the temp paths differ
between parsing and execution.
**Workaround**: Adding the `--no-static-parser` flag when invoking
dbtRunner during task execution avoids these hangs and ensures reliable
operation. This flag is not needed (and should not be added) when using
the subprocess invocation mode.

closes: #1751
related: #1750

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
(cherry picked from commit 15a8d91)
Our CI tests have been quite unstable lately; this PR aims to fix the
most recent issues.

(cherry picked from commit eb8114a)
…1772)

Previously, Cosmos was ignoring the user-defined manifest file while
executing tasks. This PR solves the problem.

Closes: #1643
(cherry picked from commit 544aec9)
…is specific per DAG run (#1741)

Refactor `AIRFLOW_ASYNC` so that the path in the remote object store is
specific per DAG run. The format of remote model path will be:

```python
# test_cosmos/simple_dag_async/run/jaffle_shop/models/example/my_first_dbt_model.sql
remote_model_path = f"{remote_target_path_str}/{dbt_dag_task_group_identifier}/{run_id}/run/{relative_file_path}"
```

Closes #1613

(cherry picked from commit 304e426)
This PR introduces a new configuration flag
`enable_memory_optimised_imports` under the `cosmos` Airflow config
section (environment variable
`AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS`) to optimise memory
usage when Cosmos is installed but not actively used or when only
certain modules of Cosmos need to be used (achieved by importing them
explicitly with their full module names).

## Changes made to accommodate the above

- Introduce `enable_memory_optimised_imports` in `cosmos/settings.py`
and guard eager imports in `__init__.py`.
- Extract provider info into `cosmos/provider_info.py` and update
entry-points.

## Problem

When Cosmos is installed, it eagerly imports many classes and modules
(e.g., `DbtDag`, `operators`, etc) in `__init__.py`, leading to
increased memory usage—observed to be approximately 200MB per task per
worker node even if Cosmos isn’t actively used.

## Proposed Solution

By default, `enable_memory_optimised_imports` is set to `False`,
preserving the current behaviour and maintaining backward compatibility
(i.e., all top-level exports remain available). When `explicit_imports`
is set to `True`, top-level imports such as `DbtDag` are no longer
automatically exposed via `cosmos.__init__.py`. This prevents the
loading of large modules unless explicitly imported, resulting in
reduced memory usage.

In Cosmos 2.0, this will become the default behaviour, and we'll remove
the existing behaviour of allowing users to import everything from
cosmos (`__init__.py`) as mentioned in #1213

## Usage

To enable optimised imports:
```
export AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS=True
```

## Memory footprint analysis

### Non-Cosmos DAG
The following experiment was conducted on an Astro deployment that had
only a single non-Cosmos DAG (DAG with 2 simple BashOperator tasks
echoing outputs) running, with the `astronomer-cosmos` package installed
in the deployment.

**Memory usage with default approach of
`enable_memory_optimised_imports` config disabled ~900MB**

<img width="1153" alt="Screenshot 2025-05-20 at 1 20 07 AM"
src="https://github.com/user-attachments/assets/ffc8d99d-d953-45de-9209-479654523df0"
/>

**Memory usage with `enable_memory_optimised_imports` config enabled
~700MB**

<img width="1343" alt="Screenshot 2025-05-20 at 1 20 22 AM"
src="https://github.com/user-attachments/assets/4ac6cb1b-ffb6-4c74-aa97-8db28dc60556"
/>

### Cosmos DAG
The following experiment was conducted on an Astro deployment that had
the below Cosmos DAG running a jaffle-shop dbt project
DAG Code:
```
from datetime import datetime
from cosmos.airflow.dag import DbtDag
from cosmos.config import ProjectConfig, RenderConfig
from cosmos.constants import LoadMode, InvocationMode, TestBehavior
from include.profiles import snowflake_db
from include.constants import jaffle_shop_path, venv_execution_config

simple_dag = DbtDag(
    project_config=ProjectConfig(jaffle_shop_path),
    profile_config=snowflake_db,
    execution_config=venv_execution_config,
    render_config=RenderConfig(
        test_behavior=TestBehavior.NONE,
    ),
    schedule=None,
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="simple_dag",
    tags=["simple"],
    default_args={
        "retries": 2,
    },
)
```
where below are the values for imported constants in the above DAG
```
jaffle_shop_path = Path("/usr/local/airflow/dbt/jaffle_shop")
dbt_executable = Path("/usr/local/airflow/dbt_venv/bin/dbt")
venv_execution_config = ExecutionConfig(dbt_executable_path=str(dbt_executable))
```

**Memory usage with default approach of
`enable_memory_optimised_imports` config disabled.**

It was observed that when **DAGs are running the memory usage peaks to
1.8-2.0GB and when no DAGs are running (idle worker), the memory usage
hovered around ~990 MB**
<img width="1482" alt="Screenshot 2025-05-21 at 5 20 13 PM"
src="https://github.com/user-attachments/assets/867e162d-7a58-455a-a232-3716c5c03e31"
/>

**Memory usage with `enable_memory_optimised_imports` config enabled**

It was observed that for **the first DAG run the memory usage peaked
upto 1.6 GB but for subsequent DAG runs the memory usage hovered around
~780 MB. This memory usage of ~780 MB remained consistent when DAGs were
run (I gave about 5 subsequent DAG runs one after the other) or the
worker was idle.**

<img width="1313" alt="Screenshot 2025-05-21 at 5 18 38 PM"
src="https://github.com/user-attachments/assets/e3f425d7-137b-4441-a8ba-d4adb587a862"
/>

This change thus provides users with more control over Cosmos’s memory
footprint with leveraging the optional config.

closes: #1652
related: #1213
related: #1471

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
(cherry picked from commit 633fcf3)
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. area:ci Related to CI, Github Actions, or other continuous integration tools area:docs Relating to documentation, changes, fixes, improvement area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:performance Related to performance, like memory usage, CPU usage, speed, etc parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing labels May 21, 2025
@pankajkoti pankajkoti requested review from pankajastro and tatiana May 21, 2025 13:14
@pankajkoti pankajkoti merged commit 16adce0 into release-1.10 May 21, 2025
5 checks passed
@pankajkoti pankajkoti deleted the release-1.10.1 branch May 21, 2025 14:01
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for leading this release and making sure everything worked smoothly, @pankajkoti , excellent work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:ci Related to CI, Github Actions, or other continuous integration tools area:docs Relating to documentation, changes, fixes, improvement area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:performance Related to performance, like memory usage, CPU usage, speed, etc parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants