Allow watcher producer retries without erroring#2283
Conversation
✅ Deploy Preview for astronomer-cosmos ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR changes the retry behavior of watcher producer operators to skip execution on retries instead of raising an error. When a retry is detected (try_number > 1), the operators now log an informational message and return None rather than raising an AirflowException.
Changes:
- Modified retry handling to skip execution instead of failing
- Changed log level from ERROR to INFO for retry detection messages
- Updated test cases to verify skipping behavior instead of exception raising
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| cosmos/operators/watcher.py | Changed retry handling to log info and return None instead of raising AirflowException |
| cosmos/operators/watcher_kubernetes.py | Changed retry handling to log info and return None instead of raising AirflowException |
| tests/operators/test_watcher.py | Updated test to verify skipping behavior and INFO logging instead of exception and ERROR logging |
| tests/operators/test_watcher_kubernetes_unit.py | Updated test to verify skipping behavior and INFO logging instead of exception and ERROR logging |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pankajkoti
left a comment
There was a problem hiding this comment.
LGTM. It could happen that users just clear the producer task and expect that it will be re-run. Considering that it would be important to add in our docs, that if they clear the producer task expecting everything to be re-run, they should also select all the downstream tasks + recursive as well while clearing the producer task.
|
Thanks a lot, @pankajkoti , please, could you review the docs - I changed in 5db165e |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2283 +/- ##
==========================================
- Coverage 98.01% 98.01% -0.01%
==========================================
Files 100 100
Lines 6414 6412 -2
==========================================
- Hits 6287 6285 -2
Misses 127 127 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
pankajkoti
left a comment
There was a problem hiding this comment.
LGTM. Thanks a lot for considering the comments!
Features * Support cross-referencing models across dbt projects using dbt-loom by @pankajkoti in #2271 * Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by @YourRoyalLinus in #2261 * Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with ``KubernetesPodOperator`` by @tatiana in #2207 * Add support for StarRocks profile mapping by @kurkim0661 in #2256 * Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275 * Support defining custom callbacks alongside the ``WATCHER_KUBERNETES`` callback by @johnhoran in #2307 Enhancements * Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in #2077 * Leverage Airflow ``::group::`` to group logs associated with DAG parsing by @tatiana in #2235 * Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in #2245 * Restore plain text output when using ``ExecutionMode.WATCHER`` by @tiovader in #2241 Bug Fixes * Fix running empty models or ephemeral nodes in ``ExecutionMode.WATCHER`` by @tatiana in #2279 * Improve watcher producer task priority in scheduling and the UI by @tatiana in #2237 * Fix typos and formatting issues in documentation by @pankajkoti in #2259 * Allow watcher producer retries without erroring by @tatiana in #2283 * Fix ``TestBehavior.AFTER_ALL`` is missing project_name information when loading project using manifest file by @tuantran0910 in #2242 * Fix duplicate log lines in watcher subprocess execution and format timestamps by @pankajkoti in #2301 Docs * Add Watcher Kubernetes documentation by @tatiana in #2303 * Document newly added telemetry metrics in the privacy notice by @pankajkoti in #2249 * Add compatibility policy document by @pankajastro in #2251 * Improve watcher documentation related to dbt threads by @tatiana in #2273 * Fix link in watcher execution mode documentation by @jedcunningham in #2277 * Update Apache Airflow minimum compatibility policy by @tatiana in #2285 * Clarify Cosmos runtime support until "End of Basic Support" by @jedcunningham in #2286 * Update watcher docs by @tatiana in #2298 * Update watcher kubernetes documentation by @tatiana in #2306 Others * Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in #2177 * Add dbt Core 1.11 to the test matrix by @tatiana in #2230 * Add integration tests using InvocationMode.SUBPROCESS and validate output by @tatiana in #2287 * Fix main branch failing tests by @tatiana in #2296 * Update pre-commit hooks to the latest versions by @jedcunningham in #2289 * Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290 * Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and #2284 * Add Scarf metrics to understand Cosmos feature usage patterns - Add telemetry tracking for dbt docs plugin usage by @pankajkoti in #2240 - Add DAG run telemetry metrics for load mode, invocation, and render_config parameters by @pankajkoti in #2223 - Collect profile metrics for DAG runs by @pankajastro in #2228 - Compress telemetry metadata to reduce serialized DAG size by @pankajkoti in #2252 - Skip storing telemetry metadata when emission is disabled by @pankajkoti in #2278 - Hide telemetry metadata parameters from the Airflow trigger UI by @pankajkoti in #2247 closes: astronomer/oss-integrations-private#317 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
- Expand the intro to motivate why Watcher remains experimental: it relies on non-idempotent Airflow tasks and a complex retry mechanism where one task's status can affect another's. - Apply the Apache Airflow® trademark style on the first prose mention. - Lead every row with a one-word status so outcomes can be scanned vertically: Yes / Maybe / No for the Airflow-vs-dbt-state table; Unsafe / Failure / Incorrect status / Works for automatic retries; Unsafe / Works / Incorrect status / Works for full clear; Not met (or Met) for the no-duplicate-runs goal. - Fix two stale baseline references: in the consumer table, 1.14.0 and 1.14.1 now reference 1.12.0 (which introduced async sensors) rather than 1.11.0; in the producer table, 1.12.1 now references 1.11.2 (which forced retries=0) rather than 1.12.0. - Split the single .. versionchanged:: 1.14.1 directive in watcher-execution-mode.rst into two chronological entries — 1.13.0 (producer stops re-running dbt build, #2283) and 1.14.1 (skip via AirflowSkipException + XCom backup, #2559) — to match the history page's timeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aims to overcome retries at a TaskGroup level, is reported in: #2282
During retries, the watcher will not execute any operation, but will not fail. We are intentionally changing the behaviour previously implemented in #2114.
The behaviour will be improved once the following ticket is implemented: #1978
Closes: #2282