Gracefully handle Variable.set() failures on Astro Remote Execution#2573
Conversation
On Astro Remote Execution, the DAG processor runs on a customer's Kubernetes cluster backed by a local SQLite DB that lacks the full Airflow schema. Variable.set() during DAG parsing routes through the Airflow 3 Task SDK, which returns HTTP 500 (no variable table) and retries 4 times per DAG per cycle — filling pod storage with tracebacks. Wrap Variable.set() in save_dbt_ls_cache() and save_yaml_selectors_cache() with a try/except that catches AirflowRuntimeError (Airflow 3 Task SDK only). On failure, log a WARNING with guidance to set AIRFLOW__COSMOS__REMOTE_CACHE_DIR instead of propagating the error. On Airflow 2 the existing direct-DB path is unchanged. Closes astronomer#2551 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR prevents Airflow 3 Task SDK Variable.set() failures during DAG parsing from surfacing as repeated parse-cycle errors by catching AirflowRuntimeError in Cosmos cache write paths and logging a warning with guidance to use remote cache storage instead.
Changes:
- Wrap
Variable.set()insave_dbt_ls_cache()andsave_yaml_selectors_cache()to catch Airflow 3’sAirflowRuntimeErrorand log a warning rather than propagating. - Add Airflow 3-only tests asserting failures are handled gracefully and produce the expected warning output.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
cosmos/dbt/graph.py |
Adds Airflow 3-specific exception handling around variable-based cache writes, logging guidance to use AIRFLOW__COSMOS__REMOTE_CACHE_DIR. |
tests/dbt/test_graph.py |
Adds tests (skipped on Airflow <3) to validate warning logging when Variable.set() raises AirflowRuntimeError. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Extract _save_cache_to_variable() to eliminate duplicated try/except/import blocks in save_dbt_ls_cache() and save_yaml_selectors_cache(). Uses the cleaner try/except/else import pattern suggested in code review. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
pre-commit.ci run |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
@hkc-8010 thank you so much for working on this and proposing a way of improving the user experience! I left minor comments inline - let me know your thoughts, and I'm happy for us to merge this change once the checks pass. Let me know if you'd like to connect to review the errors raised by the CI - we're really close!
- Extend _save_cache_to_variable docstring to note this failure is expected with Astro Remote Execution but not Astro Executor - Flatten try/except/else nesting using early return in the Airflow 2 branch - Expand warning message to offer LoadMode.MANIFEST and AIRFLOW__COSMOS__ENABLE_CACHE_DBT_LS as additional user options - Fix CI: AirflowRuntimeError in Airflow 3 SDK requires an ErrorResponse object, not a plain string; use MagicMock() to satisfy the constructor Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2573 +/- ##
=======================================
Coverage 98.05% 98.05%
=======================================
Files 103 103
Lines 7601 7614 +13
=======================================
+ Hits 7453 7466 +13
Misses 148 148 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Variable.set() failures on Astro Remote Execution
- Fix LoadMode.MANIFEST → LoadMode.DBT_MANIFEST in warning message - Tailor disable-cache env var in warning based on cache type: use AIRFLOW__COSMOS__ENABLE_CACHE_DBT_YAML_SELECTORS for YAML selectors cache and AIRFLOW__COSMOS__ENABLE_CACHE_DBT_LS for dbt ls cache - Reword docstring to refer to DAG parsing environment rather than workers, accurately reflecting where the failure occurs - Remove redundant local MagicMock imports in tests (already imported at module scope) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The YAML selectors cache is independent of the DAG load mode, so suggesting LoadMode.DBT_MANIFEST as a workaround for YAML selector cache failures is inaccurate. Only include this hint for the dbt ls cache; both cache types still show the REMOTE_CACHE_DIR and the appropriate disable-cache env var. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Bug Fixes * Fix ``ExecutionMode.WATCHER`` producer retry behaviour by @tatiana in #2559 * Prevent watcher producer skip propagating to downstream tasks via gateway task by @johnhoran and @tatiana in #2597 * Keep watcher sensor polling when producer is still running by @pankajkoti in #2592 * Fix circular import error in Cosmos plugin discovery under Astro Runtime by @tatiana in #2538 * Fix ``CosmosRichLogger`` crash on ``None`` log message by @tatiana in #2540 * Enable inlets and outlets using dbt Fusion on Airflow 3 by @ichirotakami in #2561 * Fix incorrectly skipped source downstream tasks in ``ExecutionMode.WATCHER`` by @pankajastro in #2563 * Fix duplicate logs in ``dbt build`` when source freshness is enabled by @pankajastro in #2564 * Warn and normalize when ``source_rendering_behavior=None`` is passed by @pankajastro in #2570 * Gracefully handle ``Variable.set()`` failures on Astro Remote Execution by @hkc-8010 in #2573 * Skip malformed YAML selectors instead of failing entirely by @YourRoyalLinus in #2577 Docs * Update watcher test behavior docs for Cosmos 1.14.0 by @tatiana in #2549 * Add redirect for moved partial-parsing docs page by @tatiana in #2550 * Document ``ExecutionMode.WATCHER`` and ``depends_on_past`` limitation by @tatiana in #2602 * Restore memory-optimised imports docs for Cosmos < 1.14.0 by @pankajkoti in #2604 Others * Speed up Airflow 3.1+ integration tests by caching InProcessExecutionAPI by @pankajkoti in #2547 * Improve stability of cache hash unit tests by @tatiana in #2539 * Fix mypy 1.20.0 type check failures by @pankajkoti in #2546 * Fix CI failures caused by docs build memory exhaustion by @pankajkoti in #2580 * Fix dbt Fusion broken integration tests by @tatiana in #2581 * Fix flaky ``cosmos_manifest_selectors_example`` DAG in CI by @pankajkoti in #2593 * Reduce pre-commit autoupdate frequency PRs by @tatiana in #2544 * Bump ``reviewdog/action-actionlint`` from 1.71.0 to 1.72.0 by @dependabot in #2542 * Skip watcher gateway test on Airflow 3.0 by @tatiana in #2607 closes: astronomer/oss-integrations-private#381
Summary
Variable.set()insave_dbt_ls_cache()andsave_yaml_selectors_cache()with atry/exceptthat catchesAirflowRuntimeError(Airflow 3 Task SDK only)WARNINGwith the variable key and guidance to setAIRFLOW__COSMOS__REMOTE_CACHE_DIRinstead of propagating the errorRelated Issue(s)
closes #2551
Breaking Change?
No.
Checklist
🤖 Generated with Claude Code