Skip AF 3.0 integration test stuck on example_watcher_with_freshness#2692
Conversation
stuck on example_watcher_with_freshness
There was a problem hiding this comment.
Pull request overview
This PR prevents the Airflow 3.0 integration matrix from hanging on the watcher freshness example DAG, while adding timeout diagnostics for future stuck integration tests.
Changes:
- Skips
watcher_with_freshness_check.pyin example DAG integration runs for Airflow 3.0.x. - Adds
pytest-timeoutand a 180-second per-test timeout to integration tests. - Adds a 30-minute GitHub Actions timeout to the integration test job.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
tests/test_example_dags.py |
Adds Airflow 3.0-only .airflowignore entry for the hanging watcher freshness DAG. |
scripts/test/integration.sh |
Enables pytest-timeout for integration test execution. |
pyproject.toml |
Adds pytest-timeout to the test environment dependencies. |
.github/workflows/test.yml |
Adds integration job timeout and updates push branch triggers. |
Comments suppressed due to low confidence (1)
.github/workflows/test.yml:5
- This diagnostic branch name should not be kept in the push trigger. If merged, pushes to
diagnose-af3-split1-hangin the upstream repository will continue running the fulltestworkflow even though the comment says the push trigger is for the default branch, which can consume CI unexpectedly and leaves a temporary branch-specific hook in the permanent workflow.
branches: [main]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
stuck on example_watcher_with_freshnessexample_watcher_with_freshness
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2692 +/- ##
=======================================
Coverage 98.03% 98.03%
=======================================
Files 105 105
Lines 7843 7843
=======================================
Hits 7689 7689
Misses 154 154 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
@pankajastro I'm supportive of this change once we have a follow-up ticket to investigate and address the actual issue, and to assign it to the next sprint with high priority, starting on Monday, please. Share the link here once it is logged.
Created issue #2692 and added it to next sprint with priority high |
tatiana
left a comment
There was a problem hiding this comment.
Thanks for addressing the feedback, @pankajastro !
The `Run-Integration-Tests` matrix has hung silently for the past week on the AF 3.0 + split 1 cell (all three Python variants) until GitHub Actions kills the job at its 6 h default. Add two non-speculative diagnostic levers so the next CI run dumps the failing test name and a thread stack instead of timing out blind: - `timeout-minutes: 30` on the integration job — wall-clock cap so the job ends and uploads its log within 30 min instead of 6 h. - `pytest-timeout` plugin with `--timeout=180 --timeout-method=thread` — pytest kills any single test stuck > 3 min and prints the offending thread's stack trace, which is what we need to pinpoint the hang. Wire pushes to this diagnostic branch into the `test` workflow so CI runs without opening a PR. Revert this commit once the hanging test is identified.
`Run-Integration-Tests` has hung silently on the AF 3.0 + split 1 cell (all three Python variants) for the past week, getting killed at GH's 6 h job timeout. The 30 min job timeout and `pytest-timeout` plugin added in eecb65d dumped the offending test and a thread stack: tests/test_example_dags.py::test_example_dag[example_watcher_with_freshness] File ".../cosmos/operators/_watcher/triggerer.py:82" in get_xcom_val_af3 File ".../airflow/sdk/bases/xcom.py:242" in get_one ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner' `dag.test()` on AF 3.0 runs the WatcherTrigger inline (no triggerer process). The trigger calls `XCom.get_one`, which on AF 3.0 imports `SUPERVISOR_COMMS` — a symbol that only exists from AF 3.1 onward. The import raises, `dag.test()`'s inline-trigger loop redrives the deferred task forever (~30 ms cycle), and the matrix job runs out the clock. The watcher path itself works fine on AF 3.0 in a real triggerer process; the only thing that breaks is the `dag.test()` exerciser used by the integration test. Add the DAG file to `.airflowignore` for Airflow 3.0 only (matching the existing `example_cosmos_cleanup_dag` pattern) so the matrix unblocks. Drop the skip once we either bump the floor to AF >= 3.1 or rework the trigger's XCom fetch.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
0089109 to
7baa0cd
Compare
| # Diagnostic: cap the job at 60 min so a hang does not run until GitHub's 6h default | ||
| # timeout. This is a job-level backstop and does not itself produce a pytest traceback. | ||
| timeout-minutes: 60 |
| --timeout=300 \ | ||
| --timeout-method=thread \ |
#2692) ## Summary - The `Run-Integration-Tests` matrix has been hanging silently on the **Airflow 3.0 + split 1** cell (all three Python variants) for the past week, getting killed at GitHub Actions' 6h job timeout. Six main-branch runs in a row exhausted CI capacity that way. - Pinpointed the offender with two non-speculative diagnostic levers: a 30-min wall-clock cap on the job (`.github/workflows/test.yml`) and `pytest-timeout` with `--timeout=180 --timeout-method=thread`. The next CI run failed in ~12 min with the test name + thread stack instead of a silent 6h cancel. - Hanging test: ``tests/test_example_dags.py::test_example_dag[example_watcher_with_freshness]``. The thread stack pointed at ``cosmos/operators/_watcher/triggerer.py:82``: ``` File ".../cosmos/operators/_watcher/triggerer.py", line 82, in get_xcom_val_af3 return await sync_to_async(XCom.get_one)(...) File ".../airflow/sdk/bases/xcom.py", line 242, in get_one from airflow.sdk.execution_time.task_runner import SUPERVISOR_COMMS ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner' ``` ## Why this hangs only on AF 3.0 and only in `dag.test()` - ``dag.test()`` runs the ``WatcherTrigger`` *inline* (no separate triggerer process). The trigger calls ``airflow.sdk.execution_time.xcom.XCom.get_one``, which on AF 3.0 unconditionally imports ``SUPERVISOR_COMMS`` from ``airflow.sdk.execution_time.task_runner``. That symbol does not exist until Airflow 3.1. - The ``ImportError`` surfaces as a ``NameError`` warning (``state.py:177``), the deferred task is re-queued, and ``dag.test()``'s inline-trigger loop redrives the same task forever (~30ms cycle). On AF 3.1+ ``SUPERVISOR_COMMS`` exists, so the trigger resolves and the loop terminates. - The watcher path still works in a real triggerer process on AF 3.0, so this only affects the ``dag.test()`` exerciser used by the integration matrix. Real users running the watcher in a deployed scheduler are not blocked. ## Fix - Add ``watcher_with_freshness_check.py`` to ``.airflowignore`` when ``3.0.0 <= AIRFLOW_VERSION < 3.1.0`` (mirrors the existing ``example_cosmos_cleanup_dag.py`` pattern). The DAG is still parametrized and run on AF 2.9–2.11, 3.1, 3.2. - Keep the 30-min job timeout and ``pytest-timeout`` diagnostic levers — both are cheap insurance against future silent hangs and they're what surfaced this one. Happy to drop them in review if preferred. ## Test plan - [x] Reproduced the hang on this branch's first CI run (run ``26031384067``) — all three AF 3.0 split-1 jobs failed in ~12 min with the ``SUPERVISOR_COMMS`` ImportError instead of running to the 6h cap. - [ ] Verify on this PR's CI run that the AF 3.0 + split 1 cells now finish green, while AF 2.9/2.10/2.11/3.1/3.2 still parametrize and run the DAG. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Refreshes the 1.14.2 CHANGELOG draft to incorporate nine more PRs that @pankajastro added to the Cosmos 1.14.2 milestone after the initial release PR (#2708) was opened: * Docs: #2652, #2653 * Others: #2646, #2661, #2669, #2673, #2679, #2690, #2692 PR #2618 ("Improve glossary") was excluded because it modifies ``docs/reference/glossary.rst``, which does not exist on ``release-1.14`` -- the glossary stub was added by #2461, which is not part of this release line. Picking #2618 here would require back-porting #2461 as well; deferring the glossary improvements to 1.15.0 instead. Each of the nine included PRs has been cherry-picked onto ``release-1.14.2a3`` in its original merge order and applied without manual conflict resolution. CHANGELOG entries are appended to the existing Docs and Others sections in the canonical style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🚀 Released in Cosmos 1.14.2 (PyPI). |
## CHANGELOG entry 1.14.2 (2026-05-21) ------------------- Behaviour Changes These changes adjust observable behaviour of the ``ExecutionMode.WATCHER`` execution mode. None of them breaks the public Cosmos API, but users relying on undocumented internals (graph wiring assertions, XCom backup Variable names, retry-on-recovery semantics, or retry log format) should review before upgrading. * ``ExecutionMode.WATCHER`` + ``depends_on_past=True``: when the producer task has ``depends_on_past=True`` (typically set via ``default_args``), the producer-done gateway task inside ``DbtTaskGroup`` is now wired downstream of every consumer task, in addition to the producer. This is required so that ``wait_for_downstream`` gating behaves correctly across DAG runs and the task group acts as a single unit that must fully succeed before the next run starts. Users with ``depends_on_past=False`` (the default) see no topology change. See #2615. * ``ExecutionMode.WATCHER`` downstream retry on upstream recovery: dbt models that were skipped after an upstream-failure event are now retried in the same DAG run when the upstream task succeeds on retry. Previously these models remained skipped for the run. See #2684. * ``ExecutionMode.WATCHER`` consumer-retry log format: the consumer's fallback ``dbt`` invocation no longer inherits the producer's internal ``--log-format json`` flag, so retry task logs now default to dbt's normal text format. Users who relied on JSON output in retry logs can opt in via ``operator_args={"dbt_cmd_flags": ["--log-format", "json"]}``. See #2713. * ``ExecutionMode.WATCHER`` XCom-backup Variable key scheme: the per-model XCom backup Variable key now includes the full task-group path and sanitises disallowed characters (``+`` / ``:``) from ``run_id``. External monitoring or cleanup scripts that match the old key pattern will need updating. See #2629 and #2683. Bug Fixes * Sanitize disallowed characters from XCom backup variable key by @MichaelRBlack in #2629 * Prevent watcher producers from colliding on one XCom-backup key by @tatiana in #2683 * Retry watcher downstream models on upstream-failure recovery by @tatiana in #2684 * Fix ``ExecutionMode.WATCHER`` interaction with ``depends_on_past`` by @johnhoran in #2615 * Strip ``--log-format`` from producer flags on watcher consumer retry by @tatiana in #2713 * Fix duplicate ``deferrable`` kwarg in ``DbtRunAirflowAsyncBigqueryOperator`` by @pankajastro in #2616 * Fix dbt docs iframe ``src`` missing deployment path prefix by @pankajastro in #2640 * Defer ``TaskInstance`` import in cluster policy to fix Sentry init crash by @pankajastro in #2662 * Restore type hints broken by lazy imports in ``cosmos/__init__.py`` by @pankajastro in #2647 * Fix ``ExecutionMode.WATCHER`` non-dbt stdout being suppressed from logs by @pankajastro in #2654 * Fix test sensor retry behaviour in ``ExecutionMode.WATCHER`` by @pankajkoti in #2658 * Fix watcher fallback selector for versioned dbt models by @pankajkoti in #2659 * Break out of iframe from Airflow 2 dbt Docs 404 link by @pankajastro in #2685 Docs * Document source freshness aware execution for ``ExecutionMode.WATCHER`` by @pankajastro in #2617 * Add reference docs for ``DbtRunLocalOperator``, ``DbtTestLocalOperator``, ``DbtSnapshotLocalOperator`` and ``DbtBuildLocalOperator`` by @pankajastro in #2643 * Add watcher retry behaviour history documentation by @tatiana in #2600 * Add Apache Airflow® trademark on first prominent mention by @pankajkoti in #2624 * Sentence-case section headings by @pankajkoti in #2630 * Use ``-`` for bullet points by @pankajkoti in #2631 * Drop decorative separator lines by @pankajkoti in #2632 * Normalize heading underlines in ``docs/guides/`` and ``docs/index.rst`` by @pankajkoti in #2664 * Fix broken cross-directory doc links by @pankajastro in #2694 * Fix broken external links in hand-written docs by @pankajastro in #2696 * Document support for Airflow 3.2 in the compatibility policy by @pankajastro in #2652 * Refresh the dbt/Airflow conflicts table to match the compatibility policy by @pankajastro in #2653 * Document incremental model limitation for ``ExecutionMode.AIRFLOW_ASYNC`` by @pankajastro in #2642 Others * Import ``ParamValidationError`` from ``airflow.sdk`` to silence deprecation warning by @pankajastro in #2645 * Import ``DAG`` from ``airflow.sdk`` to silence deprecation warning by @pankajastro in #2644 * Enforce docs style guide via pre-commit hook by @pankajkoti and @tatiana in #2633 * Add Airflow 3.2 to the test matrix in ``CLAUDE.md`` by @pankajastro in #2646 * Document the lazy-logging standard in ``CLAUDE.md`` by @pankajastro in #2679 * Extract watcher XCom-key helpers and inline single-use bindings by @pankajastro in #2673 * Remove leftover ``scripts/airflow3`` directory by @pankajastro in #2661 * Fix ``altered_jaffle_shop`` seed-dep CTE references by @pankajastro in #2690 * Skip Airflow 3.0 integration test stuck on ``example_watcher_with_freshness`` by @pankajastro in #2692 * Fix typo "constrantis" → "constraints" in tests env comment by @pankajastro in #2669 ## Summary Drafts the **Cosmos 1.14.2** release. Latest alpha cut is **1.14.2a4** — refreshed from `1.14.2a3` after maintainers (@pankajastro, @pankajkoti) added 11 more PRs to the milestone: - @pankajastro: #2646, #2652, #2653, #2661, #2669, #2673, #2679, #2690, #2692 - @pankajkoti: #2658, #2659 All 11 picks applied cleanly on top of the existing branch — no additional manual conflict resolution needed. **Excluded:** #2618 ("Improve glossary") — modifies `docs/reference/glossary.rst`, which doesn't exist on `release-1.14` (added on main by #2461, never backported). Deferred to 1.15.0. 33 PRs cherry-picked total (22 in the initial a3 cut + 11 in this a4 refresh); two PRs (#2575, #2618) deliberately held back as 1.15.0 content. ## Milestone [Cosmos 1.14.2](https://github.com/astronomer/astronomer-cosmos/milestone/48) — 33 merged PRs across Bug Fixes, Docs, and Others. ## Inclusion provenance | Path | PRs | Notes | |---|---|---| | **Originally in milestone — a3 cut (12)** | #2629, #2616, #2640, #2662, #2654, #2683, #2684, #2615, #2694, #2696, #2645, #2644 | Assigned by maintainers before the initial release-draft run | | **Pulled in via closed-issue link — a3 cut (1)** | #2647 | Closes milestone issue #2634 ("Typehinting broken with lazy imports") but the PR itself was never assigned to the milestone — included via `closedByPullRequestsReferences` | | **Added during cherry-pick conflict resolution — a3 cut (9)** | #2631, #2624, #2630, #2632, #2664, #2633, #2617, #2600, #2643 | Docs PRs whose absence caused `release-1.14` ↔ `main` textual drift. #2631 caused #2696's conflict; the rest were transitive dependencies (especially #2664, on top of the bullet/heading/trademark sweeps). #2643 was added to unblock #2664's operator-docs conflict | | **Added to milestone after a3 — included in a4 (11)** | #2646, #2652, #2653, #2658, #2659, #2661, #2669, #2673, #2679, #2690, #2692 | Added by @pankajastro and @pankajkoti after the initial draft. All applied cleanly on top of the a3 cherry-picks | | **Deliberately excluded (2)** | #2575, #2618 | #2575: documents `DbtDocsS3KubernetesOperator` with `.. versionadded:: 1.15.0` (already in the `Cosmos 1.15.0` milestone). #2618: improves a glossary file that doesn't exist on `release-1.14` (the stub was added by #2461, not backported) | ### Manual conflict resolution Cherry-picks that needed manual fix-up. Reviewers should double-check the files listed below: | PR | File(s) | Resolution | |---|---|---| | **#2664** | `docs/guides/dbt_docs/generating-docs.rst` | **Substantive exclusion** — manually removed the entire `Upload to S3 from Kubernetes` section (lines ~46–77 of the incoming diff) that documents `DbtDocsS3KubernetesOperator` (1.15.0 feature, PR #2575). Kept HEAD (no S3-from-Kubernetes section). | | **#2664** | `docs/guides/run_dbt/airflow-worker/watcher-execution-mode.rst` | Took incoming side for "Example 1" / "Example 2" heading underlines (`++++` style — matches #2664's normalization across the rest of the file). Also took incoming for em-dash → colon ("Example 1 —" → "Example 1:"). | | **#2664** | `docs/guides/run_dbt/operators/operators.rst` | Auto-resolved once #2643 was cherry-picked first (added missing Run/Test/Snapshot/Build operator reference docs that #2664 expected to be present). | | **#2633** | `docs/guides/run_dbt/airflow-worker/watcher-execution-mode.rst` | Initially took `'''` (Example 1/2 underlines) per #2633's incoming side — this broke the file's heading hierarchy and prevented sphinx from registering the `_watcher-source-freshness:` label, causing the docs build to fail. **Fixed in a follow-up commit** by reverting to `+++` to match main and the rest of the file's level-3 sections. | | **#2617** | `docs/guides/run_dbt/airflow-worker/watcher-execution-mode.rst` | **Substantive trim** — #2617 documented both the 1.14.0 source-freshness execution path AND the 1.15.0 `freshness_callback` override (which ships with #2586, not in this release line). Removed the `literalinclude` of `dev/dags/watcher_with_freshness_check.py` (1.15.0 example DAG, missing on release-1.14) and the surrounding override section so the 1.14.2 docs cover only what the 1.14.x line supports. Surfaced as a `-W` (warnings-as-errors) docs build failure on the first CI run. | | **#2684** | `cosmos/operators/_watcher/state.py` | Took incoming side — added two new frozensets (`DBT_UPSTREAM_FAILURE_SKIP_EVENT_NAMES`, `DBT_SOURCE_FRESHNESS_STALE_STATUSES`) at lines 28–37. HEAD had nothing; incoming had the additions. | | **#2615** | `tests/airflow/test_graph.py` | Took incoming side — added two new tests at the end of the file: `test_add_watcher_producer_task_passes_freshness_callback_via_setup_operator_args` and `test_watcher_dependency_wiring`. HEAD had nothing; incoming had the additions. | All a4 cherry-picks (11) applied without manual intervention. ## Test plan > Long-term goal: automate. For now, please pick the slice relevant to your environment and report deviations as a comment on this PR. ### Watcher mode (Postgres) - [x] @pankajkoti `example_watcher` (`dev/dags/example_watcher.py`) — default watcher run; exercises **#2629** (XCom backup key sanitization triggers on the `+` in any default Airflow run_id) - [x] (@tatiana) **[NEW]** `example_watcher_xcom_collision` (`dev/failed_dags/example_watcher_xcom_collision.py`) — validates **#2683** - [x] @tatiana **[NEW]** `example_watcher_recovers_skipped_downstream` (`dev/failed_dags/example_watcher_recovers_skipped_downstream.py`) — validates **#2684** - [x] @tatiana `example_watcher` with `default_args={"depends_on_past": True}` and ≥2 consecutive runs — validates **#2615** (manual edit; no dedicated example DAG) - [x] @pankajkoti watcher DAG with at least one model that has tests + Airflow retries enabled on the test sensor — validates **#2658** (test sensor retry path) - [x] @pankajkoti watcher DAG referencing a versioned dbt model (e.g. `models/foo_v2.sql`) and triggering the fallback selector path — validates **#2659** ### BigQuery (async) - [x] @tatiana `simple_dag_async` (`dev/dags/simple_dag_async.py`) — validates **#2616** (duplicate `deferrable` kwarg fix) ### dbt docs plugin - [x] (@pankajkoti) `docs_dag` (`dev/dags/dbt_docs.py`) + open the Cosmos dbt docs URL in the Airflow UI under a non-root deployment path — validates **#2640** (iframe `src` deployment prefix) ### Cross-cutting scenarios (no dedicated DAG) - [x] @pankajastro **#2662** — boot Airflow with Cosmos cluster policy + Sentry init; verify no `TaskInstance`-import crash on startup - [x] @pankajastro **#2654** — run a watcher DAG that prints non-dbt stdout (e.g., Snowflake `externalbrowser` auth URL); confirm output reaches task logs - [x] (@pankajkoti) **#2647** — `mypy` / IDE inspection of `from cosmos import DbtDag, ProjectConfig, ProfileConfig, RenderConfig, ExecutionConfig` resolves attributes - [x] @pankajastro **#2645, #2644** — boot Airflow 3 with `airflow.sdk` available; no `ParamValidationError` / `DAG` deprecation warnings from Cosmos imports - [x] @pankajastro **#2673** — verify watcher XCom-key extraction left no behavioural drift (run the existing watcher integration suite end-to-end on Postgres + Airflow 2.10) ### Docs build (covers all docs PRs) - [x] (@pankajkoti) `sphinx-build -W -b html docs docs/_build` succeeds with no warnings (#2617, #2624, #2630, #2631, #2632, #2643, #2600, #2664, #2694, #2696, #2652, #2653) ### Tooling - [x] (@pankajkoti) `pre-commit run check-docs-style --all-files` passes (#2633) ## Reviewer checklist - [x] CHANGELOG section assignments reviewed - [x] Entry wording reviewed - [x] `cosmos/__init__.py` bumped to `1.14.2a4` - [x] Cosmetic docs PRs (#2624, #2630, #2631, #2632, #2633, #2664) confirmed acceptable in patch line - [x] **Manual conflict resolutions** (table above) reviewed file-by-file - [x] Test plan executed on at least Postgres + one warehouse - [x] Ready to mark non-draft --------- Co-authored-by: Michael Black <4128408+MichaelRBlack@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: John Horan <jhoran@zendesk.com> Co-authored-by: Copilot <copilot@github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Summary
The
Run-Integration-Testsmatrix has been hanging silently on the Airflow 3.0 + split 1 cell (all three Python variants) for the past week, getting killed at GitHub Actions' 6h job timeout. Six main-branch runs in a row exhausted CI capacity that way.Pinpointed the offender with two non-speculative diagnostic levers: a 30-min wall-clock cap on the job (
.github/workflows/test.yml) andpytest-timeoutwith--timeout=180 --timeout-method=thread. The next CI run failed in ~12 min with the test name + thread stack instead of a silent 6h cancel.Hanging test:
tests/test_example_dags.py::test_example_dag[example_watcher_with_freshness]. The thread stack pointed atcosmos/operators/_watcher/triggerer.py:82:Why this hangs only on AF 3.0 and only in
dag.test()dag.test()runs theWatcherTriggerinline (no separate triggerer process). The trigger callsairflow.sdk.execution_time.xcom.XCom.get_one, which on AF 3.0 unconditionally importsSUPERVISOR_COMMSfromairflow.sdk.execution_time.task_runner. That symbol does not exist until Airflow 3.1.ImportErrorsurfaces as aNameErrorwarning (state.py:177), the deferred task is re-queued, anddag.test()'s inline-trigger loop redrives the same task forever (~30ms cycle). On AF 3.1+SUPERVISOR_COMMSexists, so the trigger resolves and the loop terminates.dag.test()exerciser used by the integration matrix. Real users running the watcher in a deployed scheduler are not blocked.Fix
watcher_with_freshness_check.pyto.airflowignorewhen3.0.0 <= AIRFLOW_VERSION < 3.1.0(mirrors the existingexample_cosmos_cleanup_dag.pypattern). The DAG is still parametrized and run on AF 2.9–2.11, 3.1, 3.2.pytest-timeoutdiagnostic levers — both are cheap insurance against future silent hangs and they're what surfaced this one. Happy to drop them in review if preferred.Test plan
26031384067) — all three AF 3.0 split-1 jobs failed in ~12 min with theSUPERVISOR_COMMSImportError instead of running to the 6h cap.🤖 Generated with Claude Code