Skip to content

Optimise memory usage with optional explicit imports#1769

Merged
pankajkoti merged 7 commits into
mainfrom
lazy-imports
May 21, 2025
Merged

Optimise memory usage with optional explicit imports#1769
pankajkoti merged 7 commits into
mainfrom
lazy-imports

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti commented May 19, 2025

This PR introduces a new configuration flag enable_memory_optimised_imports under the cosmos Airflow config section (environment variable AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS) to optimise memory usage when Cosmos is installed but not actively used or when only certain modules of Cosmos need to be used (achieved by importing them explicitly with their full module names).

Changes made to accommodate the above

  • Introduce enable_memory_optimised_imports in cosmos/settings.py and guard eager imports in __init__.py.
  • Extract provider info into cosmos/provider_info.py and update entry-points.

Problem

When Cosmos is installed, it eagerly imports many classes and modules (e.g., DbtDag, operators, etc) in __init__.py, leading to increased memory usage—observed to be approximately 200MB per task per worker node even if Cosmos isn’t actively used.

Proposed Solution

By default, enable_memory_optimised_imports is set to False, preserving the current behaviour and maintaining backward compatibility (i.e., all top-level exports remain available). When explicit_imports is set to True, top-level imports such as DbtDag are no longer automatically exposed via cosmos.__init__.py. This prevents the loading of large modules unless explicitly imported, resulting in reduced memory usage.

In Cosmos 2.0, this will become the default behaviour, and we'll remove the existing behaviour of allowing users to import everything from cosmos (__init__.py) as mentioned in #1213

Usage

To enable optimised imports:

export AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS=True

Memory footprint analysis

Non-Cosmos DAG

The following experiment was conducted on an Astro deployment that had only a single non-Cosmos DAG (DAG with 2 simple BashOperator tasks echoing outputs) running, with the astronomer-cosmos package installed in the deployment.

Memory usage with default approach of enable_memory_optimised_imports config disabled ~900MB

Screenshot 2025-05-20 at 1 20 07 AM

Memory usage with enable_memory_optimised_imports config enabled ~700MB

Screenshot 2025-05-20 at 1 20 22 AM

Cosmos DAG

The following experiment was conducted on an Astro deployment that had the below Cosmos DAG running a jaffle-shop dbt project
DAG Code:

from datetime import datetime
from cosmos.airflow.dag import DbtDag
from cosmos.config import ProjectConfig, RenderConfig
from cosmos.constants import LoadMode, InvocationMode, TestBehavior
from include.profiles import snowflake_db
from include.constants import jaffle_shop_path, venv_execution_config

simple_dag = DbtDag(
    project_config=ProjectConfig(jaffle_shop_path),
    profile_config=snowflake_db,
    execution_config=venv_execution_config,
    render_config=RenderConfig(
        test_behavior=TestBehavior.NONE,
    ),
    schedule=None,
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="simple_dag",
    tags=["simple"],
    default_args={
        "retries": 2,
    },
)

where below are the values for imported constants in the above DAG

jaffle_shop_path = Path("/usr/local/airflow/dbt/jaffle_shop")
dbt_executable = Path("/usr/local/airflow/dbt_venv/bin/dbt")
venv_execution_config = ExecutionConfig(dbt_executable_path=str(dbt_executable))

Memory usage with default approach of enable_memory_optimised_imports config disabled.

It was observed that when DAGs are running the memory usage peaks to 1.8-2.0GB and when no DAGs are running (idle worker), the memory usage hovered around ~990 MB
Screenshot 2025-05-21 at 5 20 13 PM

Memory usage with enable_memory_optimised_imports config enabled

It was observed that for the first DAG run the memory usage peaked upto 1.6 GB but for subsequent DAG runs the memory usage hovered around ~780 MB. This memory usage of ~780 MB remained consistent when DAGs were run (I gave about 5 subsequent DAG runs one after the other) or the worker was idle.

Screenshot 2025-05-21 at 5 18 38 PM

This change thus provides users with more control over Cosmos’s memory footprint with leveraging the optional config.

closes: #1652
related: #1213
related: #1471


Co-authored-by: Tatiana Al-Chueyr tatiana.alchueyr@gmail.com

@netlify
Copy link
Copy Markdown

netlify Bot commented May 19, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 8b7bff6
🔍 Latest deploy log https://app.netlify.com/projects/sunny-pastelito-5ecb04/deploys/682d919dfa24560008678355

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 19, 2025

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8b7bff6
Status: ✅  Deploy successful!
Preview URL: https://8930fb05.astronomer-cosmos.pages.dev
Branch Preview URL: https://lazy-imports.astronomer-cosmos.pages.dev

View logs

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.03%. Comparing base (304e426) to head (8b7bff6).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1769      +/-   ##
==========================================
+ Coverage   97.72%   98.03%   +0.31%     
==========================================
  Files          84       85       +1     
  Lines        5274     5247      -27     
==========================================
- Hits         5154     5144      -10     
+ Misses        120      103      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pankajkoti pankajkoti changed the title Add config to disable imports exposed via cosmos init module Optimise memory usage with optional explicit imports May 20, 2025
@pankajkoti pankajkoti marked this pull request as ready for review May 20, 2025 10:03
Copilot AI review requested due to automatic review settings May 20, 2025 10:03
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label May 20, 2025
@pankajkoti pankajkoti requested review from pankajastro and tatiana May 20, 2025 10:03
@dosubot dosubot Bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration area:performance Related to performance, like memory usage, CPU usage, speed, etc labels May 20, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an optional enable_memory_optimised_imports flag to defer Cosmos’s top-level imports for reduced memory usage, and reorganises version and provider_info into dedicated modules.

  • Introduce enable_memory_optimised_imports in cosmos/settings.py and guard eager imports in __init__.py.
  • Move __version__ to cosmos/version.py and update all references/tests.
  • Extract provider info into cosmos/provider_info.py and update entry-points.

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_version.py Update test to use cosmos.version.__version__.
tests/test_telemetry.py Switch to patch.object for telemetry HTTP mocks.
tests/test_settings.py Add tests for the new memory-optimised imports flag.
tests/test_log.py Import get_provider_info from new module.
pyproject.toml Adjust entry-points and version file path.
cosmos/version.py New module exposing __version__.
cosmos/settings.py Add enable_memory_optimised_imports config.
cosmos/telemetry.py Update type hint and version reference.
cosmos/provider_info.py New module for provider metadata.
cosmos/operators/local.py Remove unused logging import.
cosmos/dbt/parser/output.py Reference version from cosmos.version.
cosmos/init.py Conditional lazy imports based on new setting.
Files not reviewed (1)
  • docs/configuration/cosmos-conf.rst: Language not supported
Comments suppressed due to low confidence (4)

tests/test_settings.py:12

  • The test calls reload(settings) but settings is not imported in this file. Add import cosmos.settings as settings at the top to avoid a NameError.
reload(settings)

tests/test_log.py:3

  • [nitpick] The import cosmos.log line appears unused since you import get_provider_info directly. Removing it will clean up the imports.
import cosmos.log

cosmos/init.py:8

  • Removing the top-level __version__ attribute breaks consumers expecting cosmos.__version__. Consider re-exporting it (e.g., from .version import __version__) for backwards compatibility.
__version__ = "1.10.0"

cosmos/settings.py:45

  • The comment refers to explicit_imports but the flag is named enable_memory_optimised_imports. Update the comment to match the actual config key to avoid confusion.
# When enabled, users must access Cosmos classes via their full module paths,

Comment thread cosmos/provider_info.py
Comment thread cosmos/telemetry.py Outdated
Comment thread tests/test_telemetry.py Outdated
Comment thread cosmos/version.py Outdated
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pankajkoti, this looks great. Thank you very much for working on this and finding a solution to the initial problem. I'm glad we're moving towards improving Cosmos' memory footprint.

Some minor feedback in-line, and some additional comments:

  • It would be great if we could have an example of how to use explicit import paths, complementing the documentation you added
  • It's probably worth stating that in Cosmos 2.0, this will become the default behaviour, and we'll remove the existing behaviour of allowing people to import everything from cosmos, referencing #1213

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 20, 2025
@tatiana tatiana added this to the Cosmos 1.10.1 milestone May 20, 2025
@pankajkoti pankajkoti merged commit 633fcf3 into main May 21, 2025
96 checks passed
@pankajkoti pankajkoti deleted the lazy-imports branch May 21, 2025 12:07
pankajkoti added a commit that referenced this pull request May 21, 2025
This PR introduces a new configuration flag
`enable_memory_optimised_imports` under the `cosmos` Airflow config
section (environment variable
`AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS`) to optimise memory
usage when Cosmos is installed but not actively used or when only
certain modules of Cosmos need to be used (achieved by importing them
explicitly with their full module names).

## Changes made to accommodate the above

- Introduce `enable_memory_optimised_imports` in `cosmos/settings.py`
and guard eager imports in `__init__.py`.
- Extract provider info into `cosmos/provider_info.py` and update
entry-points.

## Problem

When Cosmos is installed, it eagerly imports many classes and modules
(e.g., `DbtDag`, `operators`, etc) in `__init__.py`, leading to
increased memory usage—observed to be approximately 200MB per task per
worker node even if Cosmos isn’t actively used.

## Proposed Solution

By default, `enable_memory_optimised_imports` is set to `False`,
preserving the current behaviour and maintaining backward compatibility
(i.e., all top-level exports remain available). When `explicit_imports`
is set to `True`, top-level imports such as `DbtDag` are no longer
automatically exposed via `cosmos.__init__.py`. This prevents the
loading of large modules unless explicitly imported, resulting in
reduced memory usage.

In Cosmos 2.0, this will become the default behaviour, and we'll remove
the existing behaviour of allowing users to import everything from
cosmos (`__init__.py`) as mentioned in #1213

## Usage

To enable optimised imports:
```
export AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS=True
```

## Memory footprint analysis

### Non-Cosmos DAG
The following experiment was conducted on an Astro deployment that had
only a single non-Cosmos DAG (DAG with 2 simple BashOperator tasks
echoing outputs) running, with the `astronomer-cosmos` package installed
in the deployment.

**Memory usage with default approach of
`enable_memory_optimised_imports` config disabled ~900MB**

<img width="1153" alt="Screenshot 2025-05-20 at 1 20 07 AM"
src="https://github.com/user-attachments/assets/ffc8d99d-d953-45de-9209-479654523df0"
/>

**Memory usage with `enable_memory_optimised_imports` config enabled
~700MB**

<img width="1343" alt="Screenshot 2025-05-20 at 1 20 22 AM"
src="https://github.com/user-attachments/assets/4ac6cb1b-ffb6-4c74-aa97-8db28dc60556"
/>

### Cosmos DAG
The following experiment was conducted on an Astro deployment that had
the below Cosmos DAG running a jaffle-shop dbt project
DAG Code:
```
from datetime import datetime
from cosmos.airflow.dag import DbtDag
from cosmos.config import ProjectConfig, RenderConfig
from cosmos.constants import LoadMode, InvocationMode, TestBehavior
from include.profiles import snowflake_db
from include.constants import jaffle_shop_path, venv_execution_config

simple_dag = DbtDag(
    project_config=ProjectConfig(jaffle_shop_path),
    profile_config=snowflake_db,
    execution_config=venv_execution_config,
    render_config=RenderConfig(
        test_behavior=TestBehavior.NONE,
    ),
    schedule=None,
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="simple_dag",
    tags=["simple"],
    default_args={
        "retries": 2,
    },
)
```
where below are the values for imported constants in the above DAG
```
jaffle_shop_path = Path("/usr/local/airflow/dbt/jaffle_shop")
dbt_executable = Path("/usr/local/airflow/dbt_venv/bin/dbt")
venv_execution_config = ExecutionConfig(dbt_executable_path=str(dbt_executable))
```

**Memory usage with default approach of
`enable_memory_optimised_imports` config disabled.**

It was observed that when **DAGs are running the memory usage peaks to
1.8-2.0GB and when no DAGs are running (idle worker), the memory usage
hovered around ~990 MB**
<img width="1482" alt="Screenshot 2025-05-21 at 5 20 13 PM"
src="https://github.com/user-attachments/assets/867e162d-7a58-455a-a232-3716c5c03e31"
/>

**Memory usage with `enable_memory_optimised_imports` config enabled**

It was observed that for **the first DAG run the memory usage peaked
upto 1.6 GB but for subsequent DAG runs the memory usage hovered around
~780 MB. This memory usage of ~780 MB remained consistent when DAGs were
run (I gave about 5 subsequent DAG runs one after the other) or the
worker was idle.**

<img width="1313" alt="Screenshot 2025-05-21 at 5 18 38 PM"
src="https://github.com/user-attachments/assets/e3f425d7-137b-4441-a8ba-d4adb587a862"
/>

This change thus provides users with more control over Cosmos’s memory
footprint with leveraging the optional config.

closes: #1652
related: #1213
related: #1471

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
(cherry picked from commit 633fcf3)
pankajkoti added a commit that referenced this pull request May 21, 2025
Bug Fixes

* Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC``
``ExecutionConfig`` mode by @tuantran0910 in #1738
* Fix dbt ls invocation method log message by @tatiana and @dstandish in
#1749
* Ensure remote target directory is created when copying files when
using local directory by @tuantran0910 and @corsettigyg in #1740
* Support custom ``packages-install-path`` by @tatiana in #1768
* Disable dbt static parser during Airflow task execution using dbt
runner by @pankajkoti and @tatiana in #1760
* Fix ``ExecutionMode.LOCAL`` to leverage
``ProjectConfig.manifest_path`` by @tatiana in #1772
* Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store
is specific per DAG run by @tuantran0910 in #1741
* Optimise memory usage with optional explicit imports by @pankajkoti
and @tatiana in #1769

Documentation

* Fix documentation rendering for ``use_dataset_airflow3_uri_standard``
by @pankajastro in #1742
* Correct custom callback example by @walter9388 in #1747

Others

* Re-enable integration tests durations to troubleshoot performance
degradation by @tatiana in #1735
* Run listener tests for Airflow 3 by @pankajastro in #1743
* Add Airflow 3 db files to ignore from git tracking by @pankajkoti in
#1755
* Log contents of ``packages.yml`` when
``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764
* Fix Airflow dependencies in the CI by @tatiana in #1773
* Pre-commit updates: #1744, #1765, #1770
pankajkoti added a commit that referenced this pull request May 21, 2025
Bug Fixes

* Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC``
``ExecutionConfig`` mode by @tuantran0910 in #1738
* Fix dbt ls invocation method log message by @tatiana and @dstandish in
#1749
* Ensure remote target directory is created when copying files when
using local directory by @tuantran0910 and @corsettigyg in #1740
* Support custom ``packages-install-path`` by @tatiana in #1768
* Disable dbt static parser during Airflow task execution using dbt
runner by @pankajkoti and @tatiana in #1760
* Fix ``ExecutionMode.LOCAL`` to leverage
``ProjectConfig.manifest_path`` by @tatiana in #1772
* Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store
is specific per DAG run by @tuantran0910 in #1741
* Optimise memory usage with optional explicit imports by @pankajkoti
and @tatiana in #1769

Documentation

* Fix documentation rendering for ``use_dataset_airflow3_uri_standard``
by @pankajastro in #1742
* Correct custom callback example by @walter9388 in #1747

Others

* Re-enable integration tests durations to troubleshoot performance
degradation by @tatiana in #1735
* Run listener tests for Airflow 3 by @pankajastro in #1743
* Add Airflow 3 db files to ignore from git tracking by @pankajkoti in
#1755
* Log contents of ``packages.yml`` when
``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764
* Fix Airflow dependencies in the CI by @tatiana in #1773
* Pre-commit updates: #1744, #1765, #1770


---------

(cherry picked from commit 430be00)
@DanMawdsleyBA
Copy link
Copy Markdown
Contributor

I just tried this out but when I made the changes all of dags were not able to render with error below:
image

ImportError: cannot import name 'DbtTaskGroup' from 'cosmos' (/usr/local/airflow/.local/lib/python3.11/site-packages/cosmos/init.py)

using version 1.10.1

@pankajkoti
Copy link
Copy Markdown
Contributor Author

pankajkoti commented Jul 21, 2025

I just tried this out but when I made the changes all of dags were not able to render with error below:
ImportError: cannot import name 'DbtTaskGroup' from 'cosmos' (/usr/local/airflow/.local/lib/python3.11/site-packages/cosmos/init.py)

using version 1.10.1

Hey @DanMawdsleyBA, when you enable this config, you need to use fully qualified imports for all Cosmos classes and methods. For example, check out this sample DAG:
https://github.com/astronomer/astronomer-cosmos/blob/main/dev/dags/basic_cosmos_dag_full_module_path_imports.py

In your case, for the specific case of DbtTaskGroup, the import would need to be from cosmos.airflow.task_group import DbtTaskGroup instead

Please also refer to docs regarding this config: https://astronomer.github.io/astronomer-cosmos/configuration/cosmos-conf.html#enable-memory-optimised-imports

pankajkoti added a commit that referenced this pull request Apr 7, 2026
Replace eager imports with lazy loading via module-level __getattr__.
Imports are deferred until first access, reducing memory footprint by
only loading modules that are actually used.

Optional dependency handling (docker, kubernetes, etc.) is preserved by
catching ImportError in __getattr__ and returning MissingPackage
sentinels, matching the previous behavior.

The enable_memory_optimised_imports setting is now a no-op since all
imports are lazy by default. Remove the related tests and update
documentation accordingly.

related: #2403 
related: #1769

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:config Related to configuration, like YAML files, environment variables, or executer configuration area:performance Related to performance, like memory usage, CPU usage, speed, etc lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Increased airflow memory usage after installing cosmos

4 participants