Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 18 additions & 30 deletions docs/getting_started/watcher-execution-mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@ Introducing ``ExecutionMode.WATCHER``: Experimental High-Performance dbt Executi

With the release of **Cosmos 1.11.0**, we are introducing a powerful new experimental execution mode — ``ExecutionMode.WATCHER`` — designed to drastically reduce dbt pipeline run times in Airflow.

Early benchmarks show that ``ExecutionMode.WATCHER`` can cut total DAG runtime **by up to 80%**, bringing performance **on par with running dbt CLI locally**. Since this execution mode improves the performance by leveraging `dbt threading <https://docs.getdbt.com/docs/running-a-dbt-project/using-threads>`_, the performance gains will depend on two major factors:
Early benchmarks show that ``ExecutionMode.WATCHER`` can cut total DAG runtime **by up to 80%**, bringing performance **on par with running dbt CLI locally**. Since this execution mode improves the performance by leveraging `dbt threading <https://docs.getdbt.com/docs/running-a-dbt-project/using-threads>`_ and Airflow deferrable sensors, the performance gains will depend on three major factors:

- The amount of ``threads`` set either via the dbt profile configuration or the dbt ``--threads`` flag
- The amount of dbt ``threads`` set either via the dbt profile configuration or the dbt ``--threads`` flag
- The topology of the dbt pipeline
- The ``poke_interval`` and ``timeout`` settings of the ``DbtConsumerWatcherSensor`` operator, which determine the frequency and duration of the sensor's polling.

-------------------------------------------------------------------------------

Expand Down Expand Up @@ -245,31 +246,30 @@ This behavior is designed to support TaskGroup-level retries, as reported in `#2

The overall retry behavior will be further improved once `#1978 <https://github.com/astronomer/astronomer-cosmos/issues/1978>`_ is implemented.

-------------------------------------------------------------------------------

Known Limitations
-------------------

These limitations will be revisited as the feature matures.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Installation of Airflow and dbt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``ExecutionMode.WATCHER`` works better when dbt and Airflow are installed in the same Python virtual environment, since it uses dbt `callback features <https://docs.getdbt.com/reference/programmatic-invocations#registering-callbacks>`_.
In case that is not possible, the producer task will only trigger the consumer tasks by the end of the execution, after it generated the ``run_results.json`` file.
Since Cosmos 1.12.0, the ``ExecutionMode.WATCHER`` works well regardless if dbt and Airflow are installed in the same Python virtual environment or not.

We plan to improve this behaviour in the future by leveraging `dbt structured logging <https://docs.getdbt.com/reference/events-logging#structured-logging>`_.
When dbt and Airflow are installed in the same Python virtual environment, the ``ExecutionMode.WATCHER`` uses dbt `callback features <https://docs.getdbt.com/reference/programmatic-invocations#registering-callbacks>`_.

In the meantime, assuming you have Cosmos installed in ``requirements.txt`` file, you would modify it to also include your dbt adapters.
When dbt and Airflow are not installed in the same Python virtual environment, the ``ExecutionMode.WATCHER`` consumes the dbt `structured logging <https://docs.getdbt.com/reference/events-logging#structured-logging>`_ to update the consumer tasks.

Example of ``requirements.txt`` file:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Synchronous versus Asynchronous sensor execution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text
In Cosmos 1.11.0, the ``DbtConsumerWatcherSensor`` operator is implemented as a synchronous XCom sensor, which continuously occupies the worker slot - even if they're just sleeping and checking periodically.

astronomer-cosmos==1.11.0
dbt-bigquery==1.10
Starting with Cosmos 1.12.0, the ``DbtConsumerWatcherSensor`` supports
`deferrable (asynchronous) execution <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/deferring.html>`_. Deferrable execution frees up the Airflow worker slot, while task status monitoring is handled by the Airflow triggerer component,
which increases overall task throughput. By default, the sensor now runs in deferrable mode.

-------------------------------------------------------------------------------

Known Limitations
-------------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Producer task implementation
Expand All @@ -279,6 +279,7 @@ The producer task is implemented as a ``DbtProducerWatcherOperator`` operator an

There are discussions about allowing this node to be implemented as the ``ExecutionMode.VIRTUALENV`` and ``ExecutionMode.KUBERNETES`` execution modes, so that there is a higher isolation between dbt and Airflow dependencies.


~~~~~~~~~~~~~~~~~~~~~~~~
Individual dbt Operators
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -380,22 +381,11 @@ The consequence is that tasks may take longer to be updated if they are not sens

We plan to review this behaviour and alternative approaches in the future.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Synchronous sensor execution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In Cosmos 1.11.0, the ``DbtConsumerWatcherSensor`` operator is implemented as a synchronous XCom sensor, which continuously occupies the worker slot - even if they're just sleeping and checking periodically.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Asynchronous sensor execution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Starting with Cosmos 1.12.0, the ``DbtConsumerWatcherSensor`` supports
`deferrable (asynchronous) execution <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/deferring.html>`_. Deferrable execution frees up the Airflow worker slot, while task status monitoring is handled by the Airflow triggerer component,
which increases overall task throughput. By default, the sensor now runs in deferrable mode.

**Limitations:**

- Deferrable execution is currently supported only for dbt models, seeds and snapshots.
- Deferrable execution applies only to the first task attempt (try number 1). For subsequent retries, the sensor falls back to synchronous execution.

Expand Down Expand Up @@ -433,8 +423,6 @@ Problem: "I changed from ``ExecutionMode.LOCAL`` to ``ExecutionMode.WATCHER``, b
Answer: Please, check the number of threads that are being used by searching the producer task logs for a message similar to ``Concurrency: 1 threads (target='DEV')``. To leverage the Watcher mode, you should have a high number of threads, at least dbt's default of 4. Check the `dbt threading docs <https://docs.getdbt.com/docs/running-a-dbt-project/using-threads>`_ for more information on how to set the number of threads.




Summary
-------

Expand Down