Skip to content

Allow watcher producer retries without erroring#2283

Merged
tatiana merged 5 commits into
mainfrom
allow-watcher-producer-retries-without-erroring
Jan 23, 2026
Merged

Allow watcher producer retries without erroring#2283
tatiana merged 5 commits into
mainfrom
allow-watcher-producer-retries-without-erroring

Conversation

@tatiana
Copy link
Copy Markdown
Collaborator

@tatiana tatiana commented Jan 23, 2026

Aims to overcome retries at a TaskGroup level, is reported in: #2282

During retries, the watcher will not execute any operation, but will not fail. We are intentionally changing the behaviour previously implemented in #2114.

The behaviour will be improved once the following ticket is implemented: #1978

Closes: #2282

Aims to overcome retries at a TaskGroup level, is reported in:
#2282

The behaviour will be improved once the following ticket is implemented:
#1978
Copilot AI review requested due to automatic review settings January 23, 2026 12:58
@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 23, 2026

Deploy Preview for astronomer-cosmos ready!

Name Link
🔨 Latest commit c7b6de8
🔍 Latest deploy log https://app.netlify.com/projects/astronomer-cosmos/deploys/6973846e088c2a0008468a0a
😎 Deploy Preview https://deploy-preview-2283--astronomer-cosmos.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the retry behavior of watcher producer operators to skip execution on retries instead of raising an error. When a retry is detected (try_number > 1), the operators now log an informational message and return None rather than raising an AirflowException.

Changes:

  • Modified retry handling to skip execution instead of failing
  • Changed log level from ERROR to INFO for retry detection messages
  • Updated test cases to verify skipping behavior instead of exception raising

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
cosmos/operators/watcher.py Changed retry handling to log info and return None instead of raising AirflowException
cosmos/operators/watcher_kubernetes.py Changed retry handling to log info and return None instead of raising AirflowException
tests/operators/test_watcher.py Updated test to verify skipping behavior and INFO logging instead of exception and ERROR logging
tests/operators/test_watcher_kubernetes_unit.py Updated test to verify skipping behavior and INFO logging instead of exception and ERROR logging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It could happen that users just clear the producer task and expect that it will be re-run. Considering that it would be important to add in our docs, that if they clear the producer task expecting everything to be re-run, they should also select all the downstream tasks + recursive as well while clearing the producer task.

@tatiana
Copy link
Copy Markdown
Collaborator Author

tatiana commented Jan 23, 2026

Thanks a lot, @pankajkoti , please, could you review the docs - I changed in 5db165e

Comment thread docs/getting_started/watcher-execution-mode.rst Outdated
Comment thread docs/getting_started/watcher-execution-mode.rst
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.01%. Comparing base (1d608c7) to head (c7b6de8).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2283      +/-   ##
==========================================
- Coverage   98.01%   98.01%   -0.01%     
==========================================
  Files         100      100              
  Lines        6414     6412       -2     
==========================================
- Hits         6287     6285       -2     
  Misses        127      127              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot for considering the comments!

@tatiana tatiana merged commit da39a56 into main Jan 23, 2026
90 checks passed
@tatiana tatiana deleted the allow-watcher-producer-retries-without-erroring branch January 23, 2026 15:21
@tatiana tatiana added this to the Cosmos 1.13.0 milestone Jan 29, 2026
@pankajastro pankajastro mentioned this pull request Jan 29, 2026
tatiana added a commit that referenced this pull request Jan 30, 2026
Features

* Support cross-referencing models across dbt projects using dbt-loom by
@pankajkoti in #2271
* Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by
@YourRoyalLinus in #2261
* Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with
``KubernetesPodOperator`` by @tatiana in #2207
* Add support for StarRocks profile mapping by @kurkim0661 in #2256
* Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275
* Support defining custom callbacks alongside the ``WATCHER_KUBERNETES``
callback by @johnhoran in #2307

Enhancements

* Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in
#2077
* Leverage Airflow ``::group::`` to group logs associated with DAG
parsing by @tatiana in #2235
* Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in
#2245
* Restore plain text output when using ``ExecutionMode.WATCHER`` by
@tiovader in #2241

Bug Fixes

* Fix running empty models or ephemeral nodes in
``ExecutionMode.WATCHER`` by @tatiana in #2279
* Improve watcher producer task priority in scheduling and the UI by
@tatiana in #2237
* Fix typos and formatting issues in documentation by @pankajkoti in
#2259
* Allow watcher producer retries without erroring by @tatiana in #2283
* Fix ``TestBehavior.AFTER_ALL`` is missing project_name information
when loading project using manifest file by @tuantran0910 in #2242
* Fix duplicate log lines in watcher subprocess execution and format
timestamps by @pankajkoti in #2301

Docs

* Add Watcher Kubernetes documentation by @tatiana in #2303
* Document newly added telemetry metrics in the privacy notice by
@pankajkoti in #2249
* Add compatibility policy document by @pankajastro in #2251
* Improve watcher documentation related to dbt threads by @tatiana in
#2273
* Fix link in watcher execution mode documentation by @jedcunningham in
#2277
* Update Apache Airflow minimum compatibility policy by @tatiana in
#2285
* Clarify Cosmos runtime support until "End of Basic Support" by
@jedcunningham in #2286
* Update watcher docs by @tatiana in #2298
* Update watcher kubernetes documentation by @tatiana in #2306

Others

* Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in
#2177
* Add dbt Core 1.11 to the test matrix by @tatiana in #2230
* Add integration tests using InvocationMode.SUBPROCESS and validate
output by @tatiana in #2287
* Fix main branch failing tests by @tatiana in #2296
* Update pre-commit hooks to the latest versions by @jedcunningham in
#2289
* Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290
* Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and
#2284
* Add Scarf metrics to understand Cosmos feature usage patterns
- Add telemetry tracking for dbt docs plugin usage by @pankajkoti in
#2240
- Add DAG run telemetry metrics for load mode, invocation, and
render_config parameters by @pankajkoti in #2223
  - Collect profile metrics for DAG runs by @pankajastro in #2228
- Compress telemetry metadata to reduce serialized DAG size by
@pankajkoti in #2252
- Skip storing telemetry metadata when emission is disabled by
@pankajkoti in #2278
- Hide telemetry metadata parameters from the Airflow trigger UI by
@pankajkoti in #2247

closes:
astronomer/oss-integrations-private#317

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
@tatiana tatiana mentioned this pull request Mar 19, 2026
2 tasks
tatiana added a commit that referenced this pull request May 12, 2026
- Expand the intro to motivate why Watcher remains experimental:
  it relies on non-idempotent Airflow tasks and a complex retry
  mechanism where one task's status can affect another's.
- Apply the Apache Airflow® trademark style on the first prose
  mention.
- Lead every row with a one-word status so outcomes can be scanned
  vertically: Yes / Maybe / No for the Airflow-vs-dbt-state table;
  Unsafe / Failure / Incorrect status / Works for automatic retries;
  Unsafe / Works / Incorrect status / Works for full clear; Not met
  (or Met) for the no-duplicate-runs goal.
- Fix two stale baseline references: in the consumer table, 1.14.0
  and 1.14.1 now reference 1.12.0 (which introduced async sensors)
  rather than 1.11.0; in the producer table, 1.12.1 now references
  1.11.2 (which forced retries=0) rather than 1.12.0.
- Split the single .. versionchanged:: 1.14.1 directive in
  watcher-execution-mode.rst into two chronological entries —
  1.13.0 (producer stops re-running dbt build, #2283) and 1.14.1
  (skip via AirflowSkipException + XCom backup, #2559) — to match
  the history page's timeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Support retrying DbtTaskGroup at the task group level with Watcher

3 participants