Skip to content

Add DbtProducerWatcherOperator for the proposed ExecutionMode.WATCHER#1982

Merged
pankajkoti merged 20 commits into
mainfrom
1958-watcher-build-coordinator-task
Sep 25, 2025
Merged

Add DbtProducerWatcherOperator for the proposed ExecutionMode.WATCHER#1982
pankajkoti merged 20 commits into
mainfrom
1958-watcher-build-coordinator-task

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti commented Sep 18, 2025

This PR introduces the DbtProducerWatcherOperator in
cosmos/operators/watcher.py for use with the proposed ExecutionMode.WATCHER.
The operator triggers a single dbt build and streams real-time per-model
run statuses via dbtRunner events, pushing keys such as nodefinished_<uid>,
aggregated dbt_startup_events, and falling back to pushing run_results
from the target directory to XCom when dbtRunner is unavailable.

Additionally, local execution has been updated with a
_push_run_results_to_xcom helper and a push_run_results_to_xcom flag,
enabling gzip+base64–compressed run-results to be stored in XCom for
fallback support.

closes: #1958
closes: https://github.com/astronomer/oss-integrations-private/issues/238

@netlify
Copy link
Copy Markdown

netlify Bot commented Sep 18, 2025

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 3f25593
🔍 Latest deploy log https://app.netlify.com/projects/sunny-pastelito-5ecb04/deploys/68d50c0ea796e40008047a44

Comment thread cosmos/operators/local.py Outdated
Comment thread cosmos/operators/local.py Outdated
@pankajkoti pankajkoti force-pushed the 1958-watcher-build-coordinator-task branch from 8ef18f8 to 2adcea1 Compare September 18, 2025 14:24
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.77%. Comparing base (8627d63) to head (dd32126).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1982      +/-   ##
==========================================
+ Coverage   97.67%   97.77%   +0.10%     
==========================================
  Files          87       87              
  Lines        5371     5443      +72     
==========================================
+ Hits         5246     5322      +76     
+ Misses        125      121       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pankajkoti pankajkoti force-pushed the 1958-watcher-build-coordinator-task branch from 041bcfa to fb9e7cb Compare September 19, 2025 11:32
@pankajkoti pankajkoti changed the title Add DbtBuildCoordinatorOperator Add DbtBuildCoordinatorOperator for the proposed ExecutionMode.WATCHER Sep 19, 2025
@pankajkoti pankajkoti marked this pull request as ready for review September 19, 2025 11:53
Copilot AI review requested due to automatic review settings September 19, 2025 11:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new DbtBuildCoordinatorOperator that enables real-time monitoring of dbt model execution status through XCom for the proposed ExecutionMode.WATCHER. The operator provides two execution paths: streaming mode using dbtRunner events for real-time per-model status updates, and fallback mode that pushes compressed run results after completion.

Key changes:

  • New DbtBuildCoordinatorOperator class with streaming and fallback execution modes
  • Support for pushing compressed run results to XCom via new push_run_results_to_xcom parameter
  • Enhanced local execution operators to support XCom-based result sharing

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
cosmos/operators/watcher.py Implements the main DbtBuildCoordinatorOperator with event streaming and XCom coordination
cosmos/operators/local.py Adds _push_run_results_to_xcom helper method and push_run_results_to_xcom parameter support
cosmos/operators/virtualenv.py Updates run_command method to pass through the new push_run_results_to_xcom parameter
tests/operators/test_watcher.py Comprehensive test suite for the new operator functionality
Comments suppressed due to low confidence (1)

cosmos/operators/local.py:1

  • Remove commented-out code. If this alternative implementation is needed for future reference, document the reasoning or move it to documentation.
from __future__ import annotations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread cosmos/operators/watcher.py
Comment thread cosmos/operators/watcher.py Outdated
Comment thread cosmos/operators/watcher.py Outdated
Comment thread tests/operators/test_watcher.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread cosmos/dbt/runner.py
Comment thread cosmos/operators/local.py Outdated
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @pankajkoti , thanks a lot for the great work and addressing the feedback.
I'm happy for us to merge this PR once the follow up ticket is logged.

Comment thread cosmos/operators/watcher.py Outdated
Comment thread cosmos/operators/watcher.py Outdated
pankajkoti and others added 2 commits September 25, 2025 15:01
Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
@pankajkoti pankajkoti merged commit ab88c29 into main Sep 25, 2025
93 checks passed
@pankajkoti pankajkoti deleted the 1958-watcher-build-coordinator-task branch September 25, 2025 09:36
pankajkoti added a commit that referenced this pull request Sep 25, 2025
…1997)

Now that PR #1982 is merged and we have the XCOM getting populated, we
can clean up the example dbt event payload dictionaries kept in the
watcher.py module. If needed to refer again, we can refer to the
previous PR #1952 that added these examples.

related: #1952 
related: #1950
tatiana added a commit that referenced this pull request Oct 6, 2025
Introduce a new high-performance execution mode, named
`ExecutionMode.WATCHER`, following the implementation of the producer
and consumer operators in PRs #1982, #1993 and #1998.

Initial performance analysis indicates that this mode will reduce the
total DAG Run time to execute dbt pipelines in Airflow to 1/5 of the
original time. For example, if a Cosmos `DbtDag` takes 5 minutes to run
with the default `ExecutionMode.LOCAL`, it will now run in 1 minute with
the new `ExecutionMode.WATCHER`.

In the near future, there will also be benefits related to CPU and
memory utilisation, as users will be able to run the producer task on a
more powerful node with increased CPU and memory resources. In
comparison, the consumer nodes can have less CPU and memory. Further
development (#1972 and #1973), testing, and analysis are needed to
evaluate this.

# Context

As of Cosmos 1.10, when users leverage the default
`ExecutionMode.LOCAL`, each dbt model becomes an Airflow run task, and
dbt is invoked in each of those tasks.

We noticed that the cost to run the same pipeline with plain dbt core
varies significantly by running:
- the whole dbt command using a single command
- running one dbt command per model

For example, for the https://github.com/google/fhir-dbt-analytics
project, these numbers were, on average, 5 minutes and 30 seconds (by
running a single `dbt run` for the whole pipeline) versus 32 minutes
(when using 184 `dbt run` commands as illustrated in
https://gist.github.com/tatiana/c7831173ab09bf05d88839fb0b557920).

Similar to the
[`ExecutionMode.AIRFLOW_ASYNC`](https://astronomer.github.io/astronomer-cosmos/getting_started/async-execution-mode.html),
this mode aims to reduce the number of times the dbt command is invoked,
while still allowing users to have observability of the dbt workflow via
the Airflow UI and being able to retry individual tasks.

# Overall solution

* Use existing Cosmos DAG rendering techniques - implemented in this PR
* Have a single Airlfow task to run "all the pipeline" (selected by the
user) - implemented in #1982
* Use dbt Core callbacks
https://docs.getdbt.com/reference/programmatic-invocations#registering-callbacks
to track how the model's execution is progressing and update different
Xcoms (one Xcom per model) - implemented in #1982
* All the other tasks, by default, should watch their designated Xcom -
implemented in #1998 and used in this PR

This proposal follows up on a successful internal PoC
(astronomer/oss-integrations-private#185),
available in the branch
https://github.com/astronomer/astronomer-cosmos/tree/single-run-execution-mode.

# Benefits

An initial performance analysis by @pankajkoti showed promising results:

| Experiment | Number of threads | Execution time (s) |

|---------------------------------------------------------------|-------------------|--------------------|
| dbt build | 4 | 6 - 7 |
| dbt run for each of model locally | | 30 |
| Cosmos default ExecutionMode.LOCAL in Astro CLI locally | | 10 - 15 |
| Cosmos proposed ExecutionMode.WATCHER in Astro CLI locally | 1 | 26 |
| | 2 | 14 |
| | 4 | 7 |
| | 8 | 4 |
| | 16 | 2 |
| The ExecutionMode.WATCHER in Airflow with an Astro deployment | 8 | 5
|

# Example of usage

Example of DAG topology, with the producer task preceding the others.
<img width="1624" height="1056" alt="Screenshot 2025-10-06 at 17 34 53"
src="https://github.com/user-attachments/assets/54d3290a-297d-417b-a255-6bb376e7d055"
/>

The dbt root nodes are set with `trigger_rule` `always`, so they start
sensing once the producer begins.
<img width="1624" height="1056" alt="Screenshot 2025-10-06 at 17 44 32"
src="https://github.com/user-attachments/assets/81d8f27c-adb1-47e7-ba99-1a103a32b35e"
/>

Producer task runs dbt Core, as shown on the logs:
<img width="1624" height="1056" alt="Screenshot 2025-10-06 at 17 47 59"
src="https://github.com/user-attachments/assets/10588abf-77c9-4e71-9c6b-1161f18d4bcf"
/>

Consumer task senses XCom, waiting for producer to finish running dbt:
<img width="1624" height="1056" alt="Screenshot 2025-10-06 at 17 48 12"
src="https://github.com/user-attachments/assets/4e5dd334-1e68-4d25-a733-ba4460766eb0"
/>

Evidence that producer is running concurrently to the dbt root nodes
sensing:
<img width="1624" height="1056" alt="Screenshot 2025-10-06 at 17 46 55"
src="https://github.com/user-attachments/assets/1220046a-e686-4bbe-b88b-d020e0e6e2f6"
/>



# Related tickets

Closes #1964
Closes #1959 (*)

(*) I ended up implementing this while trying to enforce the producer
task to run before the consumer tasks when running `airflow dags test`.

Outside of the scope of this PR:
- Documentation (this will be added as part of #245)
- We are not implementing support for the following operators in the
`ExecutionMode.WATCHER` mode:
  - LS
  - Run operation
  - Docs
  - Clone
Since it does not make sense to have them, we can review them later.

There are many other tasks related to this execution mode that can be
tracked by searching issues using `label:execution:watcher`:

https://github.com/astronomer/astronomer-cosmos/issues?q=is%3Aissue%20state%3Aopen%20label%3Aexecution%3Awatcher

# Why is this PR still in draft?

Pending:
- Understand and fix the watcher task is hanging when running
integration tests for [some of our
tests](https://github.com/astronomer/astronomer-cosmos/actions/runs/18220340583/job/51878734681)
- Add more tests 

# Credits

The idea for this approach appeared in a discussion with @ashb.

The implementation of this feature is the result of teamwork with
@pankajastro and @pankajkoti, both directly and indirectly involvement
via PoC and previous PRs:

- Co-authored-by: Pankaj Koti <pankaj.koti@astronomer.io>
- Co-authored-by: Pankaj Singh <pankaj.singh@astronomer.io>
@tatiana tatiana mentioned this pull request Oct 7, 2025
tatiana added a commit that referenced this pull request Oct 28, 2025
Introduce the documentaiton for the recently introduced high-performance
execution mode, named ExecutionMode.WATCHER, following the
implementation in PRs #1982, #1993 and #1998. #1999.

Closes: #1965
Closes: astronomer/oss-integrations-private#245
@tatiana tatiana added this to the Cosmos 1.11.0 milestone Oct 29, 2025
tatiana added a commit that referenced this pull request Oct 29, 2025
**Features**

* Introduce ``ExecutionMode.WATCHER`` to reduce DAG run time by 1/5 in
several PRs. Learn more about it
[here](https://astronomer.github.io/astronomer-cosmos/getting_started/watcher-execution-mode.html#watcher-execution-mode).
This feature was implemented via multiple PRs, including:
* Expose new execution mode by @tatiana @pankajastro @pankajkoti in
#1999
* Add ``DbtProducerWatcherOperator`` for the proposed
``ExecutionMode.WATCHER`` by @pankajkoti in #1982
* Add ``DbtConsumerWatcherSensor`` for the proposed
``ExecutionMode.WATCHER`` by @pankajastro in #1998
* Push producer's task completion status to XCOM by @pankajkoti in #2000
* Add default priority_weight for ``DbtProducerWatcherOperator`` by
@pankajkoti in #1995
* Add sample dbt events for the dbt watcher execution mode by
@pankajkoti in #1952
* Add ``compiled_sql`` as a template fields on
```ExecutionMode.WATCHER``` when using ``run_results.json`` by
@pankajastro in #2070
* Set ``push_run_results_to_xcom`` kwargs correctly for invocation mode
subprocess and Watcher mode by @pankajastro in #2067
* Store compiled SQL as template field for dbt callback events in
``ExecutionMode.WATCHER`` by @pankajkoti in #2068
* Add initial documentation for ``ExecutionMode.WATCHER`` by @tatiana in
#2046
* Support running ``State.UPSTREAM_FAILED`` tasks when WATCHER consumer
upstream tasks fail by @tatiana in #2062
* Fail sensor tasks immediately if the ``ExecutionMode.WATCHER``
producer task fails by @pankajastro in #2040
  * Add ``WATCHER``` to GitHub issue template by @tatiana in #2056
* Add support for ``TestBehavior.AFTER_ALL`` with
``ExecutionMode.WATCHER`` by @pankajastro in #2049
* Add support for ``TestBehavior.NONE`` with ``ExecutionMode.WATCHER``
by @pankajastro in #2047
* Fix ``ExecutionMode.WATCHER`` behaviour with ``DbtTaskGroup`` by
@tatiana in #2044
* Fix Cosmos behaviour when using watcher with
``InvocationMode.DBT_RUNNER`` by @tatiana in #2048

* Add Airflow 3 plugin for dbt docs with multiple dbt projects support
by @pankajkoti in #2009, check the
[documentation](https://astronomer.github.io/astronomer-cosmos/configuration/hosting-docs.html).
* Initial support to ``dbt Fusion`` by @tatiana in #1803. More details
[here](https://astronomer.github.io/astronomer-cosmos/configuration/dbt-fusion).
* Support to prune sources without downstream references in dbt projects
by @corsettigyg in #1988
* Allow to set task display name as a user-defined function by
@corsettigyg in #1761
* Add dbt project's hash to dag docs to support dag versioning in
Airflow 3 by @pankajkoti in #1907
* feat: Add Jinja templating support for ``dbt_cmd_flags`` by
@skillicinski in #1899
* Add Scarf metric to collect the execution mode uses by @pankajastro in
#1981
* Support Airflow 3.1 by @tatiana in #1980
* Add MySQL profile mapping by @Lee2532 in #1977
* Add sqlserver profile mapping by @pankajastro in #1737

**Enhancement**

* Use XCom to store sql when using ``ExecutionMode.AIRFLOW_ASYNC`` by
@pankajastro in #1934
* Refactor ``AIRFLOW_ASYNC`` teardown so it doesn't install the
virtualenv by @pankajastro in #1938
* Reuse the virtual env for ``AIRFLOW_ASYNC`` setup task by @pankajastro
in #1939
* Improve dataset/asset experience in Cosmos by @tatiana in #2030
* Add ``downstreams`` to ``DbtNode`` by @wornjs in #2028

**Bug fixes**

* Fix tags extraction by @ms32035 in #1915
* Fix task flow operator args by @anyapriya in #2024

**Documentation**

* Add documentation for Airflow 3 Plugin supporting dbt docs for
multiple dbt projects by @pankajkoti in #2063
* Add Cosmos Deferrable Operator Guide by @pankajastro in #1922
* Add dbt Fusion documentation by @tatiana in #1824 #1830
* Update dbt-fusion.rst to explicitly highlight it is in alpha by
@tatiana in #1838
* Fix a bunch of docs build errors and warnings by @pankajkoti in
#1886
* Add docs note for param virtualenv_dir for async execution mode by
@pankajastro in #1969
* Use pepy.tech downloads badge in README by @pankajkoti in #1920
* Correct the default value of ``cache_dir`` by @seokyun.ha in #2027

**Others**

* Promote @corsettigyg to committer by @tatiana in #1985
* Add @pankajkoti and @pankajastro to ``contributors.rst`` by @tatiana
in #1983
* Update setup script for airflow3 script by @dwreeves in #2023
* Prevent pytest from trying to test classes that aren't actually tests
by @anyapriya in #2032
* Fix ``dag.test()`` for Airflow 3.1+ by syncing DAG to database bby
@kaxil in #2037
* Disable Scarf in CI by @pankajastro in #2016
* Fix failing dbt Fusion tests when run in parallel in CI by @pankajkoti
in #1896
* Fix MyPy issues related to ``ObjectStoragePath`` in main branch by
@tatiana in #2012
* Cleanup example dbt event JSON dictionaries kept for XCOM referencby
@pankajkoti in #1997
* Bump min hatch version that includes fixes for click>=8.3.0 by
@pankajkoti in #1996
* Use official postgres image from Docker hub for kubernetes setup by
@pankajkoti in #1986
* Use click<8.3.0 for hatch as click 8.3 breaks hatch by @pankajkoti in
#1987
* Pin Airflow version in type check CI job by @pankajastro in #2003
* Improve comments after feedback on #1948 by @tatiana in #1963
* Fix running tests with dbt Fusion 2.0.0 preview versions by @tatiana
in #1948
* Test hardening of dbt node having tags as unset or missing by
@pankajkoti in #1918
* Fix Sphinx issue in the main branch by @tatiana in #2064
* pre-commit autoupdate in #2065, #2043, #2033, #2019, #1990, #2019,
#2008, #1941, #1935, #1924
* GitHub dependabot update in #2051, #2050, #2038, #2022, #1947, #1955,
#1946, #1944, #1945, #1928, #1921, #1917


Co-authored-by: Pankaj Koti <pankaj.koti@astronomer.io>
Co-authored-by: Pankaj Singh <pankaj.singh@astronomer.io>
Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add root dbt build task with event callbacks and XCOM Support [ExecutionMode.WATCHER]

4 participants