Fix import handling by lazy loading hooks introduced in PR #1109#1132
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
pankajkoti
left a comment
There was a problem hiding this comment.
Thanks @dwreeves for trying out from main and finding an issue. Many thanks for going ahead and creating a fix ❤️
LGTM, just have a question regarding the wasb URI format we could use and then I believe we are good to merge it.
cc9e039 to
bc7a46a
Compare
|
@pankajkoti Thank you for the very speedy review. As per my comment above I looked into things a little bit more and incorporated your bit of feedback, plus learned a few more things (looks like |
pankajkoti
left a comment
There was a problem hiding this comment.
LGTM.
Thanks for identifying this issue and fixing it so promptly & proactively! Appreciate it a lot! I'm glad we've avoiding affecting users before the release 😌
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1132 +/- ##
=======================================
Coverage 96.50% 96.51%
=======================================
Files 64 64
Lines 3321 3325 +4
=======================================
+ Hits 3205 3209 +4
Misses 116 116 ☔ View full report in Codecov by Sentry. |
|
@dwreeves could we merge the PR? Or is there something more you'd like to add here? |
|
@pankajkoti It's fine to merge! |
…#1109 (astronomer#1132) Making an update to astronomer#1109, which introduced module-level imports of optional dependencies. This is inappropriate as it will break if the user does not have them installed, and indeed the user really does not need them installed if they are not relying on them directly. This PR lazy-loads the imports so that it does not impact users who do not need them. In the upath library, `az:`, `adl:`, `abfs:` and `abfss:` are also all valid schemes, albeit Airflow only references the latter 3 in the code: https://github.com/apache/airflow/blob/e3824eaaba7eada9a807f7a2f9f89d977a210e15/airflow/providers/microsoft/azure/fs/adls.py#L29, so `adl:`, `abfs:` and `abfss:` also have been added to the list of schemes supported.
Making an update to #1109, which introduced module-level imports of optional dependencies. This is inappropriate as it will break if the user does not have them installed, and indeed the user really does not need them installed if they are not relying on them directly. This PR lazy-loads the imports so that it does not impact users who do not need them. In the upath library, `az:`, `adl:`, `abfs:` and `abfss:` are also all valid schemes, albeit Airflow only references the latter 3 in the code: https://github.com/apache/airflow/blob/e3824eaaba7eada9a807f7a2f9f89d977a210e15/airflow/providers/microsoft/azure/fs/adls.py#L29, so `adl:`, `abfs:` and `abfss:` also have been added to the list of schemes supported.
New Features * Add support for loading manifest from cloud stores using Airflow Object Storage by @pankajkoti in #1109 * Cache ``package-lock.yml`` file by @pankajastro in #1086 * Support persisting the ``LoadMode.VIRTUALENV`` directory by @tatiana in #1079 * Add support to store and fetch ``dbt ls`` cache in remote stores by @pankajkoti in #1147 * Add default source nodes rendering by @arojasb3 in #1107 * Add Teradata ``ProfileMapping`` by @sc250072 in #1077 Enhancements * Add ``DatabricksOauthProfileMapping`` profile by @CorsettiS in #1091 * Use ``dbt ls`` as the default parser when ``profile_config`` is provided by @pankajastro in #1101 * Add task owner to dbt operators by @wornjs in #1082 * Extend Cosmos custom selector to support + when using paths and tags by @mvictoria in #1150 * Simplify logging by @dwreeves in #1108 Bug fixes * Fix Teradata ``ProfileMapping`` target invalid issue by @sc250072 in #1088 * Fix empty tag in case of custom parser by @pankajastro in #1100 * Fix ``dbt deps`` of ``LoadMode.DBT_LS`` should use ``ProjectConfig.dbt_vars`` by @tatiana in #1114 * Fix import handling by lazy loading hooks introduced in PR #1109 by @dwreeves in #1132 * Fix Airflow 2.10 regression and add Airflow 2.10 in test matrix by @pankajastro in #1162 Docs * Fix typo in azure-container-instance docs by @pankajastro in #1106 * Use Airflow trademark as it has been registered by @pankajastro in #1105 Others * Run some example DAGs in Kubernetes execution mode in CI by @pankajastro in #1127 * Install requirements.txt by default during dev env spin up by @@CorsettiS in #1099 * Remove ``DbtGraph.current_version`` dead code by @tatiana in #1111 * Disable test for Airflow-2.5 and Python-3.11 combination in CI by @pankajastro in #1124 * Pre-commit hook updates in #1074, #1113, #1125, #1144, #1154, #1167 --------- Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Cosmos exposes execution-mode operators (Kubernetes, AWS EKS/ECS, GCP Cloud Run Job, Azure Container Instance, Docker) and a wide set of profile mappings that depend on optional Airflow provider packages and cloud SDKs. Importing these at the module level turns each provider into a hard runtime dependency for every Cosmos user, even those who never use that feature - see PR #1109 / #1132 for the prior history of this concern. As of now, `cosmos/__init__.py` already defers the heavy operator modules via `_LAZY_IMPORTS` + `__getattr__`, but nothing previously prevented a future PR from adding a fresh eager provider import in a non-leaf module and quietly broadening the dependency surface. This PR enables ruff's `TID253` (`banned-module-level-imports`) with the provider/SDK packages we ship with optional-extra installs, and adds explicit `per-file-ignores` for the leaf opt-in modules that legitimately need those imports at the module level. **Banned at module level** - Airflow provider packages: `amazon`, `cncf.kubernetes`, `databricks`, `docker`, `google`, `microsoft.azure`, `snowflake`. - Cloud/orchestration SDKs: `azure`, `boto3`, `botocore`, `databricks.sql`, `docker`, `google.cloud`, `google.auth`, `kubernetes`, `snowflake.connector`. - Cloud filesystems: `adlfs`, `gcsfs`, `s3fs`. - Other heavy optional deps: `mlflow`, `sentry_sdk`. **Where they're allowed (per-file-ignores)** The leaf opt-in modules accessed lazily through `cosmos/__init__.py`: - `cosmos/operators/kubernetes.py` - `cosmos/operators/aws_eks.py` - `cosmos/operators/aws_ecs.py` - `cosmos/operators/azure_container_instance.py` - `cosmos/operators/gcp_cloud_run_job.py` - `cosmos/operators/docker.py` - `cosmos/operators/watcher_kubernetes.py` - `cosmos/airflow/_override.py` Plus `tests/**`, `dev/**`, and `scripts/**` (not shipped to users). **What's NOT affected** - `if TYPE_CHECKING:` blocks — not module-level execution. - `try: ... except ImportError:` compat shims (e.g. the Airflow 2/3 `BaseHook` split in `cosmos/profiles/base.py`) - different code shape, not matched by TID253. - Function/method-level imports — already deferred. ## Verification - `pre-commit run ruff-check --all-files` passes on `main` with the new config. - A deliberate violation injected into `cosmos/operators/local.py` (`import kubernetes` at module top) produced a clear error: ``` TID253 `kubernetes` is banned at the module level --> cosmos/operators/local.py:3:8 ``` Follow-up from the audit prompted by PR #1109 / PR #1132. 🤖 Generated with Claude Code (https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
Making an update to #1109, which introduced module-level imports of optional dependencies. This is inappropriate as it will break if the user does not have them installed, and indeed the user really does not need them installed if they are not relying on them directly.
This PR lazy-loads the imports so that it does not impact users who do not need them.
Additionally, the scheme added for Azure only supported the Azure Data Lake Storage V2 protocol and not the (legacy, but also I believe more common?)
wasb://protocol, which I added. Additionally, theconn_idwas being pulled from the wrong hook for theabfs://scheme. This is not just nitpicking, as the defaultconn_idfor each hook is actually different: theconn_idthat corresponds with theabfs://scheme isadls_default, whereas for thewasb://scheme it iswasb_default.This PR should be merged before releasing
1.6to prevent breaking anyone's environments. 😄 Sorry to sound the bold alarms, but this one is actually pretty important.