Skip to content

Gracefully handle Variable.set() failures on Astro Remote Execution#2573

Merged
tatiana merged 5 commits into
astronomer:mainfrom
hkc-8010:fix/variable-set-cache-error-handling
Apr 21, 2026
Merged

Gracefully handle Variable.set() failures on Astro Remote Execution#2573
tatiana merged 5 commits into
astronomer:mainfrom
hkc-8010:fix/variable-set-cache-error-handling

Conversation

@hkc-8010
Copy link
Copy Markdown
Contributor

Summary

  • Wraps Variable.set() in save_dbt_ls_cache() and save_yaml_selectors_cache() with a try/except that catches AirflowRuntimeError (Airflow 3 Task SDK only)
  • On failure, logs a WARNING with the variable key and guidance to set AIRFLOW__COSMOS__REMOTE_CACHE_DIR instead of propagating the error
  • On Airflow 2 the existing direct-DB path is unchanged (no exception expected)

Related Issue(s)

closes #2551

Breaking Change?

No.

Checklist

  • I have added tests that prove my fix is effective or that my feature works

🤖 Generated with Claude Code

On Astro Remote Execution, the DAG processor runs on a customer's
Kubernetes cluster backed by a local SQLite DB that lacks the full
Airflow schema. Variable.set() during DAG parsing routes through the
Airflow 3 Task SDK, which returns HTTP 500 (no variable table) and
retries 4 times per DAG per cycle — filling pod storage with tracebacks.

Wrap Variable.set() in save_dbt_ls_cache() and save_yaml_selectors_cache()
with a try/except that catches AirflowRuntimeError (Airflow 3 Task SDK
only). On failure, log a WARNING with guidance to set
AIRFLOW__COSMOS__REMOTE_CACHE_DIR instead of propagating the error.
On Airflow 2 the existing direct-DB path is unchanged.

Closes astronomer#2551

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 18, 2026 07:52
@hkc-8010 hkc-8010 requested review from a team, corsettigyg, dwreeves and jbandoro as code owners April 18, 2026 07:52
@hkc-8010 hkc-8010 requested review from pankajkoti and tatiana April 18, 2026 07:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents Airflow 3 Task SDK Variable.set() failures during DAG parsing from surfacing as repeated parse-cycle errors by catching AirflowRuntimeError in Cosmos cache write paths and logging a warning with guidance to use remote cache storage instead.

Changes:

  • Wrap Variable.set() in save_dbt_ls_cache() and save_yaml_selectors_cache() to catch Airflow 3’s AirflowRuntimeError and log a warning rather than propagating.
  • Add Airflow 3-only tests asserting failures are handled gracefully and produce the expected warning output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
cosmos/dbt/graph.py Adds Airflow 3-specific exception handling around variable-based cache writes, logging guidance to use AIRFLOW__COSMOS__REMOTE_CACHE_DIR.
tests/dbt/test_graph.py Adds tests (skipped on Airflow <3) to validate warning logging when Variable.set() raises AirflowRuntimeError.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/graph.py Outdated
Extract _save_cache_to_variable() to eliminate duplicated
try/except/import blocks in save_dbt_ls_cache() and
save_yaml_selectors_cache(). Uses the cleaner try/except/else
import pattern suggested in code review.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@hkc-8010
Copy link
Copy Markdown
Contributor Author

pre-commit.ci run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkc-8010 thank you so much for working on this and proposing a way of improving the user experience! I left minor comments inline - let me know your thoughts, and I'm happy for us to merge this change once the checks pass. Let me know if you'd like to connect to review the errors raised by the CI - we're really close!

Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/graph.py Outdated
- Extend _save_cache_to_variable docstring to note this failure is
  expected with Astro Remote Execution but not Astro Executor
- Flatten try/except/else nesting using early return in the Airflow 2
  branch
- Expand warning message to offer LoadMode.MANIFEST and
  AIRFLOW__COSMOS__ENABLE_CACHE_DBT_LS as additional user options
- Fix CI: AirflowRuntimeError in Airflow 3 SDK requires an ErrorResponse
  object, not a plain string; use MagicMock() to satisfy the constructor

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py
Comment thread cosmos/dbt/graph.py Outdated
Comment thread tests/dbt/test_graph.py Outdated
Comment thread tests/dbt/test_graph.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.05%. Comparing base (37fd06c) to head (8fe7d42).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2573   +/-   ##
=======================================
  Coverage   98.05%   98.05%           
=======================================
  Files         103      103           
  Lines        7601     7614   +13     
=======================================
+ Hits         7453     7466   +13     
  Misses        148      148           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tatiana tatiana changed the title Gracefully handle Variable.set() failures in cache save methods Gracefully handle Variable.set() failures on Astro Remote Execution Apr 20, 2026
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution, @hkc-8010 ! This will certainly give users a better experience. I also liked the alternatives you gave.

Please, could you address Copilot review, and we can merge?

- Fix LoadMode.MANIFEST → LoadMode.DBT_MANIFEST in warning message
- Tailor disable-cache env var in warning based on cache type: use
  AIRFLOW__COSMOS__ENABLE_CACHE_DBT_YAML_SELECTORS for YAML selectors
  cache and AIRFLOW__COSMOS__ENABLE_CACHE_DBT_LS for dbt ls cache
- Reword docstring to refer to DAG parsing environment rather than
  workers, accurately reflecting where the failure occurs
- Remove redundant local MagicMock imports in tests (already imported
  at module scope)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py
The YAML selectors cache is independent of the DAG load mode, so
suggesting LoadMode.DBT_MANIFEST as a workaround for YAML selector
cache failures is inaccurate. Only include this hint for the dbt ls
cache; both cache types still show the REMOTE_CACHE_DIR and the
appropriate disable-cache env var.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@tatiana tatiana merged commit fa735d2 into astronomer:main Apr 21, 2026
125 checks passed
@pankajkoti pankajkoti mentioned this pull request Apr 23, 2026
@tatiana tatiana added this to the Cosmos 1.14.1 milestone Apr 23, 2026
tatiana pushed a commit that referenced this pull request Apr 23, 2026
Bug Fixes

* Fix ``ExecutionMode.WATCHER`` producer retry behaviour by @tatiana in
#2559
* Prevent watcher producer skip propagating to downstream tasks via
gateway task by @johnhoran and @tatiana in #2597
* Keep watcher sensor polling when producer is still running by
@pankajkoti in #2592
* Fix circular import error in Cosmos plugin discovery under Astro
Runtime by @tatiana in #2538
* Fix ``CosmosRichLogger`` crash on ``None`` log message by @tatiana in
#2540
* Enable inlets and outlets using dbt Fusion on Airflow 3 by
@ichirotakami in #2561
* Fix incorrectly skipped source downstream tasks in
``ExecutionMode.WATCHER`` by @pankajastro in #2563
* Fix duplicate logs in ``dbt build`` when source freshness is enabled
by @pankajastro in #2564
* Warn and normalize when ``source_rendering_behavior=None`` is passed
by @pankajastro in #2570
* Gracefully handle ``Variable.set()`` failures on Astro Remote
Execution by @hkc-8010 in #2573
* Skip malformed YAML selectors instead of failing entirely by
@YourRoyalLinus in #2577

Docs

* Update watcher test behavior docs for Cosmos 1.14.0 by @tatiana in
#2549
* Add redirect for moved partial-parsing docs page by @tatiana in #2550
* Document ``ExecutionMode.WATCHER`` and ``depends_on_past`` limitation
by @tatiana in #2602
* Restore memory-optimised imports docs for Cosmos < 1.14.0 by
@pankajkoti in #2604

Others

* Speed up Airflow 3.1+ integration tests by caching
InProcessExecutionAPI by @pankajkoti in #2547
* Improve stability of cache hash unit tests by @tatiana in #2539
* Fix mypy 1.20.0 type check failures by @pankajkoti in #2546
* Fix CI failures caused by docs build memory exhaustion by @pankajkoti
in #2580
* Fix dbt Fusion broken integration tests by @tatiana in #2581
* Fix flaky ``cosmos_manifest_selectors_example`` DAG in CI by
@pankajkoti in #2593
* Reduce pre-commit autoupdate frequency PRs by @tatiana in #2544
* Bump ``reviewdog/action-actionlint`` from 1.71.0 to 1.72.0 by
@dependabot in #2542
* Skip watcher gateway test on Airflow 3.0 by @tatiana in #2607

closes: astronomer/oss-integrations-private#381
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Variable.set() during DAG parsing fails on Astro Remote Execution with sqlite3.OperationalError: no such table: variable

3 participants