diff --git a/docs/guides/run_dbt/airflow-worker/watcher-retry-history.rst b/docs/guides/run_dbt/airflow-worker/watcher-retry-history.rst index 6db4b8fef9..a472863001 100644 --- a/docs/guides/run_dbt/airflow-worker/watcher-retry-history.rst +++ b/docs/guides/run_dbt/airflow-worker/watcher-retry-history.rst @@ -52,7 +52,16 @@ Does the Airflow state match dbt's? The consumer tasks for those models are marked successful instead of running their fallback retry, so the DAG appears successful even though dbt failed. * - **1.14.1** - - **Yes**. + - **Mostly**, except upstream-failure-skipped downstream nodes could produce a "false + green" — fixed in 1.14.2 by + `#2684 `_. + * - **1.14.2** + - **Yes**. Same as 1.14.1, plus a previously unreported "false green" gap is closed: + when a model fails on its first attempt and dbt skips downstream nodes via the + ``upstream_failure`` path, those downstream models are now re-run when their upstream + recovers via consumer fallback, instead of being silently marked SKIPPED while the DAG + completes ``success`` with the downstream tables un-materialized + (`#2684 `_). Task-level retry — consumer +++++++++++++++++++++++++++++ @@ -91,6 +100,19 @@ Task-level retry — consumer * - **1.14.1** - Similar to 1.12.0. Consumers always read correct model statuses thanks to the producer's XCom backup mechanism — see *Task-level retry — producer*. + * - **1.14.2** + - Similar to 1.14.1, with two additions. First, consumers for models that dbt marked as + skipped due to upstream failure (``upstream_failure``) now fail on attempt 1 rather than + skipping — Cosmos rewrites those ``"skipped"`` statuses to ``"failed"`` in the + producer-side log parser, so Airflow retries and the existing + ``_fallback_to_non_watcher_run`` path runs the model locally once its upstream has + recovered (`#2684 `_). + Second, the consumer's fallback ``dbt`` invocation no longer inherits the producer's + internal ``--log-format json`` flag, so retry task logs default to dbt's normal text + output instead of one structured event per line + (`#2713 `_); users who want + JSON output on retry can opt in via + ``operator_args={"dbt_cmd_flags": ["--log-format", "json"]}``. Task-level retry — producer +++++++++++++++++++++++++++++ @@ -132,7 +154,7 @@ Task-level retry — producer restores it on retry, so consumers always read correct model statuses. The backup Variable is cleaned up on success. - **Known issues with the XCom backup mechanism:** + **Known issues with the XCom backup mechanism (both fixed in 1.14.2):** - `#2619 `_ — backup Variable key is not sanitized for ``:`` and ``+`` in Airflow 3 default ``run_id`` formats; strict-naming @@ -141,6 +163,18 @@ Task-level retry — producer - `#2625 `_ — on Airflow 2, ``_get_task_group_id()`` returns ``None``, so multiple ``DbtTaskGroup`` producers in the same DAG run share one backup key and log ``UniqueViolation`` on every model completion. + * - **1.14.2** + - Same as 1.14.1, with both XCom-backup known issues from 1.14.1 fixed. The key generator now + sanitises every component to ``[A-Za-z0-9_-]`` — the safest subset across the secrets + backends Airflow ships with — so ``:`` / ``+`` from default Airflow 3 ``run_id`` formats no + longer break strict-naming backends + (`#2629 `_, closes + `#2619 `_). And + ``_get_task_group_id`` now reads ``task.task_group.group_id`` (the correct attribute) instead + of the non-existent ``task.task_group_id`` it was reading before, so each ``DbtTaskGroup`` + producer in a DAG writes to a distinct backup Variable + (`#2683 `_, closes + `#2625 `_). Automatic retries +++++++++++++++++ @@ -180,6 +214,13 @@ Automatic retries - **Works.** Producer auto-retries raise ``AirflowSkipException``; XCom is restored from the Variable backup so consumers read correct model statuses. Subject to the XCom backup known issues — see *Task-level retry — producer*. + * - **1.14.2** + - **Works.** Same as 1.14.1, with the XCom-backup known issues fixed + (`#2629 `_, + `#2683 `_ — see + *Task-level retry — producer*). Additionally, downstream models whose upstream failed on + the producer's first attempt are now re-run on Airflow retry instead of staying SKIPPED + (`#2684 `_). Full DAG / TaskGroup clear ++++++++++++++++++++++++++ @@ -221,6 +262,11 @@ Full DAG / TaskGroup clear so it does not propagate to tasks downstream of the group (`#2594 `_). The gateway is only added for ``DbtTaskGroup`` — ``DbtDag`` does not need it. + * - **1.14.2** + - **Works.** Same as 1.14.1, with the XCom-backup known issues fixed + (`#2629 `_, + `#2683 `_ — see + *Task-level retry — producer*). Avoid duplicate or concurrent runs of the same dbt transformation in the same DAG run ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ @@ -266,3 +312,13 @@ Avoid duplicate or concurrent runs of the same dbt transformation in the same DA Cosmos now checks the producer's state first: if it is still running, the sensor keeps polling instead of launching a duplicate ``dbt`` invocation; only after the producer reaches a terminal state does the consumer fall back to ``ExecutionMode.LOCAL``. + * - **1.14.2** + - **Met.** Same as 1.14.1. When ``depends_on_past=True`` is set on a ``DbtTaskGroup``, the + cross-run risk where the next DAG run's producer could start before the previous run's + consumers had completed + (`#2596 `_) is now mitigated: + leaf consumer tasks are wired upstream of the producer-done gateway, and the producer + plus watcher tasks carry ``wait_for_downstream=True`` + (`#2615 `_), so the task group + behaves as a single atomic unit across runs. Topology is unchanged for the default + ``depends_on_past=False``.