Optimise memory usage with optional explicit imports#1769
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Deploying astronomer-cosmos with
|
| Latest commit: |
8b7bff6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://8930fb05.astronomer-cosmos.pages.dev |
| Branch Preview URL: | https://lazy-imports.astronomer-cosmos.pages.dev |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1769 +/- ##
==========================================
+ Coverage 97.72% 98.03% +0.31%
==========================================
Files 84 85 +1
Lines 5274 5247 -27
==========================================
- Hits 5154 5144 -10
+ Misses 120 103 -17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull Request Overview
This PR adds an optional enable_memory_optimised_imports flag to defer Cosmos’s top-level imports for reduced memory usage, and reorganises version and provider_info into dedicated modules.
- Introduce
enable_memory_optimised_importsincosmos/settings.pyand guard eager imports in__init__.py. - Move
__version__tocosmos/version.pyand update all references/tests. - Extract provider info into
cosmos/provider_info.pyand update entry-points.
Reviewed Changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_version.py | Update test to use cosmos.version.__version__. |
| tests/test_telemetry.py | Switch to patch.object for telemetry HTTP mocks. |
| tests/test_settings.py | Add tests for the new memory-optimised imports flag. |
| tests/test_log.py | Import get_provider_info from new module. |
| pyproject.toml | Adjust entry-points and version file path. |
| cosmos/version.py | New module exposing __version__. |
| cosmos/settings.py | Add enable_memory_optimised_imports config. |
| cosmos/telemetry.py | Update type hint and version reference. |
| cosmos/provider_info.py | New module for provider metadata. |
| cosmos/operators/local.py | Remove unused logging import. |
| cosmos/dbt/parser/output.py | Reference version from cosmos.version. |
| cosmos/init.py | Conditional lazy imports based on new setting. |
Files not reviewed (1)
- docs/configuration/cosmos-conf.rst: Language not supported
Comments suppressed due to low confidence (4)
tests/test_settings.py:12
- The test calls
reload(settings)butsettingsis not imported in this file. Addimport cosmos.settings as settingsat the top to avoid a NameError.
reload(settings)
tests/test_log.py:3
- [nitpick] The
import cosmos.logline appears unused since you importget_provider_infodirectly. Removing it will clean up the imports.
import cosmos.log
cosmos/init.py:8
- Removing the top-level
__version__attribute breaks consumers expectingcosmos.__version__. Consider re-exporting it (e.g.,from .version import __version__) for backwards compatibility.
__version__ = "1.10.0"
cosmos/settings.py:45
- The comment refers to
explicit_importsbut the flag is namedenable_memory_optimised_imports. Update the comment to match the actual config key to avoid confusion.
# When enabled, users must access Cosmos classes via their full module paths,
tatiana
left a comment
There was a problem hiding this comment.
Hi @pankajkoti, this looks great. Thank you very much for working on this and finding a solution to the initial problem. I'm glad we're moving towards improving Cosmos' memory footprint.
Some minor feedback in-line, and some additional comments:
- It would be great if we could have an example of how to use explicit import paths, complementing the documentation you added
- It's probably worth stating that in Cosmos 2.0, this will become the default behaviour, and we'll remove the existing behaviour of allowing people to import everything from
cosmos, referencing #1213
This PR introduces a new configuration flag `enable_memory_optimised_imports` under the `cosmos` Airflow config section (environment variable `AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS`) to optimise memory usage when Cosmos is installed but not actively used or when only certain modules of Cosmos need to be used (achieved by importing them explicitly with their full module names). ## Changes made to accommodate the above - Introduce `enable_memory_optimised_imports` in `cosmos/settings.py` and guard eager imports in `__init__.py`. - Extract provider info into `cosmos/provider_info.py` and update entry-points. ## Problem When Cosmos is installed, it eagerly imports many classes and modules (e.g., `DbtDag`, `operators`, etc) in `__init__.py`, leading to increased memory usage—observed to be approximately 200MB per task per worker node even if Cosmos isn’t actively used. ## Proposed Solution By default, `enable_memory_optimised_imports` is set to `False`, preserving the current behaviour and maintaining backward compatibility (i.e., all top-level exports remain available). When `explicit_imports` is set to `True`, top-level imports such as `DbtDag` are no longer automatically exposed via `cosmos.__init__.py`. This prevents the loading of large modules unless explicitly imported, resulting in reduced memory usage. In Cosmos 2.0, this will become the default behaviour, and we'll remove the existing behaviour of allowing users to import everything from cosmos (`__init__.py`) as mentioned in #1213 ## Usage To enable optimised imports: ``` export AIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS=True ``` ## Memory footprint analysis ### Non-Cosmos DAG The following experiment was conducted on an Astro deployment that had only a single non-Cosmos DAG (DAG with 2 simple BashOperator tasks echoing outputs) running, with the `astronomer-cosmos` package installed in the deployment. **Memory usage with default approach of `enable_memory_optimised_imports` config disabled ~900MB** <img width="1153" alt="Screenshot 2025-05-20 at 1 20 07 AM" src="https://github.com/user-attachments/assets/ffc8d99d-d953-45de-9209-479654523df0" /> **Memory usage with `enable_memory_optimised_imports` config enabled ~700MB** <img width="1343" alt="Screenshot 2025-05-20 at 1 20 22 AM" src="https://github.com/user-attachments/assets/4ac6cb1b-ffb6-4c74-aa97-8db28dc60556" /> ### Cosmos DAG The following experiment was conducted on an Astro deployment that had the below Cosmos DAG running a jaffle-shop dbt project DAG Code: ``` from datetime import datetime from cosmos.airflow.dag import DbtDag from cosmos.config import ProjectConfig, RenderConfig from cosmos.constants import LoadMode, InvocationMode, TestBehavior from include.profiles import snowflake_db from include.constants import jaffle_shop_path, venv_execution_config simple_dag = DbtDag( project_config=ProjectConfig(jaffle_shop_path), profile_config=snowflake_db, execution_config=venv_execution_config, render_config=RenderConfig( test_behavior=TestBehavior.NONE, ), schedule=None, start_date=datetime(2023, 1, 1), catchup=False, dag_id="simple_dag", tags=["simple"], default_args={ "retries": 2, }, ) ``` where below are the values for imported constants in the above DAG ``` jaffle_shop_path = Path("/usr/local/airflow/dbt/jaffle_shop") dbt_executable = Path("/usr/local/airflow/dbt_venv/bin/dbt") venv_execution_config = ExecutionConfig(dbt_executable_path=str(dbt_executable)) ``` **Memory usage with default approach of `enable_memory_optimised_imports` config disabled.** It was observed that when **DAGs are running the memory usage peaks to 1.8-2.0GB and when no DAGs are running (idle worker), the memory usage hovered around ~990 MB** <img width="1482" alt="Screenshot 2025-05-21 at 5 20 13 PM" src="https://github.com/user-attachments/assets/867e162d-7a58-455a-a232-3716c5c03e31" /> **Memory usage with `enable_memory_optimised_imports` config enabled** It was observed that for **the first DAG run the memory usage peaked upto 1.6 GB but for subsequent DAG runs the memory usage hovered around ~780 MB. This memory usage of ~780 MB remained consistent when DAGs were run (I gave about 5 subsequent DAG runs one after the other) or the worker was idle.** <img width="1313" alt="Screenshot 2025-05-21 at 5 18 38 PM" src="https://github.com/user-attachments/assets/e3f425d7-137b-4441-a8ba-d4adb587a862" /> This change thus provides users with more control over Cosmos’s memory footprint with leveraging the optional config. closes: #1652 related: #1213 related: #1471 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com> (cherry picked from commit 633fcf3)
Bug Fixes * Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC`` ``ExecutionConfig`` mode by @tuantran0910 in #1738 * Fix dbt ls invocation method log message by @tatiana and @dstandish in #1749 * Ensure remote target directory is created when copying files when using local directory by @tuantran0910 and @corsettigyg in #1740 * Support custom ``packages-install-path`` by @tatiana in #1768 * Disable dbt static parser during Airflow task execution using dbt runner by @pankajkoti and @tatiana in #1760 * Fix ``ExecutionMode.LOCAL`` to leverage ``ProjectConfig.manifest_path`` by @tatiana in #1772 * Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store is specific per DAG run by @tuantran0910 in #1741 * Optimise memory usage with optional explicit imports by @pankajkoti and @tatiana in #1769 Documentation * Fix documentation rendering for ``use_dataset_airflow3_uri_standard`` by @pankajastro in #1742 * Correct custom callback example by @walter9388 in #1747 Others * Re-enable integration tests durations to troubleshoot performance degradation by @tatiana in #1735 * Run listener tests for Airflow 3 by @pankajastro in #1743 * Add Airflow 3 db files to ignore from git tracking by @pankajkoti in #1755 * Log contents of ``packages.yml`` when ``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764 * Fix Airflow dependencies in the CI by @tatiana in #1773 * Pre-commit updates: #1744, #1765, #1770
Bug Fixes * Fix ``full_refresh`` parameter in ``AIRFLOW_ASYNC`` ``ExecutionConfig`` mode by @tuantran0910 in #1738 * Fix dbt ls invocation method log message by @tatiana and @dstandish in #1749 * Ensure remote target directory is created when copying files when using local directory by @tuantran0910 and @corsettigyg in #1740 * Support custom ``packages-install-path`` by @tatiana in #1768 * Disable dbt static parser during Airflow task execution using dbt runner by @pankajkoti and @tatiana in #1760 * Fix ``ExecutionMode.LOCAL`` to leverage ``ProjectConfig.manifest_path`` by @tatiana in #1772 * Refactor ``AIRFLOW_ASYNC`` so that the path in the remote object store is specific per DAG run by @tuantran0910 in #1741 * Optimise memory usage with optional explicit imports by @pankajkoti and @tatiana in #1769 Documentation * Fix documentation rendering for ``use_dataset_airflow3_uri_standard`` by @pankajastro in #1742 * Correct custom callback example by @walter9388 in #1747 Others * Re-enable integration tests durations to troubleshoot performance degradation by @tatiana in #1735 * Run listener tests for Airflow 3 by @pankajastro in #1743 * Add Airflow 3 db files to ignore from git tracking by @pankajkoti in #1755 * Log contents of ``packages.yml`` when ``AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG`` by @tatiana in #1764 * Fix Airflow dependencies in the CI by @tatiana in #1773 * Pre-commit updates: #1744, #1765, #1770 --------- (cherry picked from commit 430be00)
Hey @DanMawdsleyBA, when you enable this config, you need to use fully qualified imports for all Cosmos classes and methods. For example, check out this sample DAG: In your case, for the specific case of Please also refer to docs regarding this config: https://astronomer.github.io/astronomer-cosmos/configuration/cosmos-conf.html#enable-memory-optimised-imports |
Replace eager imports with lazy loading via module-level __getattr__. Imports are deferred until first access, reducing memory footprint by only loading modules that are actually used. Optional dependency handling (docker, kubernetes, etc.) is preserved by catching ImportError in __getattr__ and returning MissingPackage sentinels, matching the previous behavior. The enable_memory_optimised_imports setting is now a no-op since all imports are lazy by default. Remove the related tests and update documentation accordingly. related: #2403 related: #1769 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This PR introduces a new configuration flag
enable_memory_optimised_importsunder thecosmosAirflow config section (environment variableAIRFLOW__COSMOS__ENABLE_MEMORY_OPTIMISED_IMPORTS) to optimise memory usage when Cosmos is installed but not actively used or when only certain modules of Cosmos need to be used (achieved by importing them explicitly with their full module names).Changes made to accommodate the above
enable_memory_optimised_importsincosmos/settings.pyand guard eager imports in__init__.py.cosmos/provider_info.pyand update entry-points.Problem
When Cosmos is installed, it eagerly imports many classes and modules (e.g.,
DbtDag,operators, etc) in__init__.py, leading to increased memory usage—observed to be approximately 200MB per task per worker node even if Cosmos isn’t actively used.Proposed Solution
By default,
enable_memory_optimised_importsis set toFalse, preserving the current behaviour and maintaining backward compatibility (i.e., all top-level exports remain available). Whenexplicit_importsis set toTrue, top-level imports such asDbtDagare no longer automatically exposed viacosmos.__init__.py. This prevents the loading of large modules unless explicitly imported, resulting in reduced memory usage.In Cosmos 2.0, this will become the default behaviour, and we'll remove the existing behaviour of allowing users to import everything from cosmos (
__init__.py) as mentioned in #1213Usage
To enable optimised imports:
Memory footprint analysis
Non-Cosmos DAG
The following experiment was conducted on an Astro deployment that had only a single non-Cosmos DAG (DAG with 2 simple BashOperator tasks echoing outputs) running, with the
astronomer-cosmospackage installed in the deployment.Memory usage with default approach of
enable_memory_optimised_importsconfig disabled ~900MBMemory usage with
enable_memory_optimised_importsconfig enabled ~700MBCosmos DAG
The following experiment was conducted on an Astro deployment that had the below Cosmos DAG running a jaffle-shop dbt project
DAG Code:
where below are the values for imported constants in the above DAG
Memory usage with default approach of
enable_memory_optimised_importsconfig disabled.It was observed that when DAGs are running the memory usage peaks to 1.8-2.0GB and when no DAGs are running (idle worker), the memory usage hovered around ~990 MB

Memory usage with
enable_memory_optimised_importsconfig enabledIt was observed that for the first DAG run the memory usage peaked upto 1.6 GB but for subsequent DAG runs the memory usage hovered around ~780 MB. This memory usage of ~780 MB remained consistent when DAGs were run (I gave about 5 subsequent DAG runs one after the other) or the worker was idle.
This change thus provides users with more control over Cosmos’s memory footprint with leveraging the optional config.
closes: #1652
related: #1213
related: #1471
Co-authored-by: Tatiana Al-Chueyr tatiana.alchueyr@gmail.com