Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions docs/optimize_performance/index.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,28 @@
.. _optimize-performance:

Optimize the performance of your Cosmos Dags
--------------------------------------------
Optimize Performance
--------------------

Cosmos performance can be tuned across two dimensions: how fast DAGs are parsed (affecting how quickly they appear
and update in Airflow) and how fast tasks execute (affecting DAG run duration).

- :ref:`optimize-rendering` -- Speed up DAG parsing by choosing the right LoadMode, reducing DAG granularity, and skipping stale sources.
- :ref:`optimize-execution` -- Speed up DAG runs by choosing the right execution mode, sizing workers, and reducing per-task overhead.
- :ref:`perf-troubleshooting` -- Diagnose common performance issues such as slow parsing, missing DAGs, and Out of Memory (OOM) errors.

The following pages cover specific optimization mechanisms in more detail:

- :ref:`memory-optimization` -- Reduce memory consumption during DAG parsing and task execution.
- :ref:`caching` -- How Cosmos caches dbt ls output, partial parse files, profiles, and YAML selectors.
- :ref:`invocation-mode` -- Choose between running dbt as a library or as a subprocess.

.. toctree::
:maxdepth: 1
:caption: Optimize Performance

optimize_rendering
optimize_execution
troubleshooting
memory_optimization
caching
invocation_mode
171 changes: 171 additions & 0 deletions docs/optimize_performance/optimize_execution.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
.. _optimize-execution:

Optimize Task Execution
-----------------------

Once your DAG is parsed, performance depends on how quickly tasks execute. Each Cosmos task runs one or more dbt
commands, and the overhead of those invocations adds up across a DAG run. This page covers the most impactful ways
to reduce DAG run time.


1. Use an efficient execution mode
+++++++++++++++++++++++++++++++++++

The execution mode determines how Cosmos runs dbt commands at task execution time. Choosing the right mode is the
single most impactful change for DAG run performance.

**Recommended: use** ``ExecutionMode.WATCHER``

In the default ``ExecutionMode.LOCAL``, every model runs as a separate ``dbt run`` invocation, which introduces
per-task overhead. ``ExecutionMode.WATCHER`` runs a single ``dbt build`` across the entire project and uses
dbt's native threading to parallelize models, while still giving you model-level visibility in Airflow.

Comment thread
pankajkoti marked this conversation as resolved.
Benchmarks show **up to 80% reduction in DAG run time** compared to ``ExecutionMode.LOCAL``.
See :ref:`watcher-execution-mode` for setup instructions and detailed benchmarks.

.. note::

``ExecutionMode.WATCHER`` is currently experimental. Review its
`known limitations <https://astronomer.github.io/astronomer-cosmos/guides/run_dbt/airflow-worker/watcher-execution-mode.html#known-limitations>`_
before adopting it in production.

.. code-block:: python

from cosmos import DbtDag, ExecutionConfig
from cosmos.constants import ExecutionMode

DbtDag(
dag_id="my_dbt_dag",
execution_config=ExecutionConfig(
execution_mode=ExecutionMode.WATCHER,
),
# ...
)

.. tip::

``ExecutionMode.WATCHER`` performance scales with dbt ``threads``. Start with a conservative value that matches
the CPU capacity of your workers, then increase gradually. See
:ref:`watcher-execution-mode` for how to configure threads.

**Alternative for BigQuery: use** ``ExecutionMode.AIRFLOW_ASYNC``

If you use BigQuery, ``ExecutionMode.AIRFLOW_ASYNC`` pre-compiles SQL transformations and executes them using
Airflow's deferrable ``BigQueryInsertJobOperator``. This frees worker slots while queries execute in BigQuery,
achieving roughly **35% faster execution** compared to ``ExecutionMode.LOCAL``. See :ref:`async-execution-mode`.

**Baseline: use** ``ExecutionMode.LOCAL`` **with** ``InvocationMode.DBT_RUNNER``

If neither ``WATCHER`` nor ``AIRFLOW_ASYNC`` suits your setup, configure ``ExecutionMode.LOCAL`` to use
``InvocationMode.DBT_RUNNER`` to run dbt as a library call rather than spawning a subprocess for each task.
Since Cosmos 1.4, ``DBT_RUNNER`` is the preferred invocation mode and is auto-selected when dbt is available in
the same Python environment; otherwise Cosmos falls back to ``InvocationMode.SUBPROCESS``. See :ref:`invocation-mode`.


2. Install dbt in the same Python environment as Airflow
++++++++++++++++++++++++++++++++++++++++++++++++++++++++

When dbt is installed alongside Airflow, Cosmos uses dbt's programmatic API (``dbtRunner``) instead of spawning
subprocesses. This eliminates process creation overhead and reduces both CPU and memory usage during task execution
and DAG parsing.

This is required for ``InvocationMode.DBT_RUNNER`` and yields the best performance with ``ExecutionMode.WATCHER``.

For more details, see :ref:`invocation-mode`.


3. Pre-install dbt dependencies
+++++++++++++++++++++++++++++++

By default, Cosmos runs ``dbt deps`` during both DAG parsing and task execution to ensure packages are available.
For large projects with many packages, this adds significant overhead to every task.

**Pre-install packages in your Docker image:**

.. code-block:: docker

# In your Dockerfile
COPY dbt_project/ /opt/dbt/project/
RUN cd /opt/dbt/project && dbt deps

**Then disable runtime installation:**

.. code-block:: python

from cosmos import ProjectConfig

ProjectConfig(
dbt_project_path="/opt/dbt/project",
install_dbt_deps=False,
)


4. Use profiles.yml instead of profile mapping
+++++++++++++++++++++++++++++++++++++++++++++++

Cosmos can generate dbt profiles at runtime from Airflow connections using profile mapping classes. While convenient,
this adds overhead to each task invocation because Cosmos must read the Airflow connection and construct the profile.

If performance is a priority, provide a ``profiles.yml`` file directly. This avoids the runtime profile generation
entirely.

For how to configure this, see
`Using your profiles.yml <https://astronomer.github.io/astronomer-cosmos/guides/connect_database/use-your-profiles-yml.html>`_.


5. Worker node sizing
+++++++++++++++++++++

The adequate resources needed to run Cosmos tasks depend on your dbt project and the Cosmos configuration chosen.
Airflow workers configured for IO-intensive workloads (the default on Astro, which uses a ratio of 5 concurrent
processes per vCPU) may not have enough CPU capacity for Cosmos tasks, which involve parsing dbt projects and running
dbt commands.

The following table provides recommended concurrency ratios based on execution mode:

.. list-table:: Recommended worker concurrency (concurrent processes per vCPU)
:header-rows: 1
:widths: 50 20

* - Execution mode
- Ratio
* - ``ExecutionMode.LOCAL`` with dbt in the same Python environment
- 2:1
* - ``ExecutionMode.LOCAL`` with dbt in a separate virtual environment
- 1:1
* - ``ExecutionMode.AIRFLOW_ASYNC`` (BigQuery)
- 4:1

.. note::

Keep in mind that Airflow re-parses the DAG file on the worker node every time a task runs. If you are using
``LoadMode.DBT_LS``, this means each task also triggers a dbt project parse. Consider using
``LoadMode.DBT_MANIFEST`` to reduce worker-side parsing overhead. See :ref:`optimize-rendering`.

If you are using ``ExecutionMode.WATCHER``, the producer task is CPU and memory intensive while the consumer sensor
tasks are lightweight. Use the ``watcher_dbt_execution_queue``
`configuration <https://astronomer.github.io/astronomer-cosmos/guides/run_dbt/airflow-worker/watcher-execution-mode.html#watcher-dbt-execution-queue>`_
to route the producer task and sensor retries to a worker queue with more resources.


6. Profile memory usage with debug mode
++++++++++++++++++++++++++++++++++++++++

To right-size your workers, enable Cosmos debug mode to measure actual memory consumption per task:

.. code-block:: bash

export AIRFLOW__COSMOS__ENABLE_DEBUG_MODE=True

When enabled, Cosmos tracks peak memory usage during task execution and pushes it to XCom under the key
``cosmos_debug_max_memory_mb``. Use this data to:

- Identify which tasks consume the most memory
- Set appropriate memory limits and worker queue assignments
- Detect memory regressions over time

For high-memory tasks, consider using separate
`Airflow pools <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/pools.html>`_
or the ``watcher_dbt_execution_queue`` configuration to route them to workers with more resources.

For more memory optimization strategies, see :ref:`memory-optimization`.
133 changes: 133 additions & 0 deletions docs/optimize_performance/optimize_rendering.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
.. _optimize-rendering:

Optimize DAG Parsing
--------------------

Every time Airflow parses a DAG file that contains a ``DbtDag`` or ``DbtTaskGroup``, Cosmos must load and process the
dbt project to build the corresponding Airflow task graph. The time this takes directly affects how quickly your DAGs
appear and update in Airflow. This page covers the most impactful ways to reduce that parse time.

.. tip::

Cosmos logs the time it takes to parse each dbt project at the ``INFO`` level:

.. code-block:: text

Cosmos performance (<cache_id>) - [<hostname>|<pid>]: It took 0.068s to parse the dbt project for DAG using LoadMode.DBT_LS_CACHE

Search your Airflow scheduler or DAG processor logs for ``Cosmos performance`` to measure your current parse time.


1. Choose the right LoadMode
++++++++++++++++++++++++++++

The ``LoadMode`` controls how Cosmos reads your dbt project. It is the single most impactful setting for parse-time
performance.

**Recommended: use** ``LoadMode.DBT_MANIFEST``

Parsing a pre-compiled ``manifest.json`` is the fastest option because it avoids running any dbt command at parse time.

.. code-block:: python

from cosmos import DbtDag, ProjectConfig, RenderConfig
from cosmos.constants import LoadMode

DbtDag(
dag_id="my_dbt_dag",
project_config=ProjectConfig(
dbt_project_path="/path/to/dbt/project",
manifest_path="/path/to/dbt/project/target/manifest.json",
),
render_config=RenderConfig(
load_method=LoadMode.DBT_MANIFEST,
),
# ...
)

To generate the manifest, run the following from your dbt project directory (typically as part of CI/CD):

.. code-block:: bash

dbt deps # install packages first
dbt compile # generates target/manifest.json

Then make the resulting ``target/manifest.json`` available to Cosmos via a local path.

For more details on all parsing methods, see :ref:`parsing-methods`.

**If you cannot pre-compute the manifest**

Use ``LoadMode.DBT_LS`` with the following optimizations to minimize parse-time overhead:

- **Enable caching** (on by default since Cosmos 1.5) so that ``dbt ls`` output is reused across parses. See :ref:`caching`.
- **Use** ``InvocationMode.DBT_RUNNER`` (default since Cosmos 1.9) to run ``dbt ls`` as a library call instead of a subprocess. See :ref:`invocation-mode`.
- **Keep partial parsing enabled** (on by default) so dbt skips re-parsing unchanged project files. See :ref:`partial-parsing`.
- **Pre-install dbt packages** in your Docker image or CI and disable runtime installation:

.. code-block:: python

ProjectConfig(
dbt_project_path="/path/to/dbt/project",
install_dbt_deps=False, # skip dbt deps at parse time
)

This avoids running ``dbt deps`` on every DAG parse, which can be slow when packages need to be fetched.


2. Reduce DAG granularity
+++++++++++++++++++++++++

Fewer nodes in the Airflow DAG means faster parsing. There are two ways to reduce the number of nodes Cosmos generates.

**Select only the nodes you need**

Use ``select``, ``exclude``, or ``selector`` in ``RenderConfig`` to limit which dbt nodes are included in the DAG.
For example, to run only models tagged ``daily``:

.. code-block:: python

RenderConfig(
select=["tag:daily"],
)

For the full selection syntax, see :ref:`selecting-excluding`.

**Choose an efficient TestBehavior**

The default ``TestBehavior.AFTER_EACH`` creates a separate test task after every model, which can significantly
increase the number of tasks in the DAG. Consider these alternatives:

- ``TestBehavior.NONE`` -- no test tasks are created. Use this if tests are not needed or are run separately.
- ``TestBehavior.BUILD`` -- tests run as part of the model task itself (via ``dbt build``), so no additional tasks are created.
- ``TestBehavior.AFTER_ALL`` -- a single test task runs after all models complete.

.. code-block:: python

from cosmos.constants import TestBehavior

RenderConfig(
test_behavior=TestBehavior.BUILD,
)


3. Skip stale sources
+++++++++++++++++++++

If your DAG includes multiple data sources and some may not have fresh data, you can avoid running unnecessary
branches by rendering source nodes with freshness checks. When a source is not fresh, the downstream branch can be
skipped.

To enable this, configure ``source_rendering_behavior`` in ``RenderConfig`` and customize the source node behavior
using ``node_converters``:

.. code-block:: python

from cosmos.constants import SourceRenderingBehavior

RenderConfig(
source_rendering_behavior=SourceRenderingBehavior.WITH_TESTS_OR_FRESHNESS,
)

For details on source rendering and how to customize source node behavior, see :ref:`managing-sources` and
:ref:`dag_customization`.
Loading
Loading