astronomer · pankajkoti · Apr 1, 2026 · Mar 31, 2026 · Apr 1, 2026
@@ -1,12 +1,28 @@
 .. _optimize-performance:
 
-Optimize the performance of your Cosmos Dags
---------------------------------------------
+Optimize Performance
+--------------------
+
+Cosmos performance can be tuned across two dimensions: how fast DAGs are parsed (affecting how quickly they appear
+and update in Airflow) and how fast tasks execute (affecting DAG run duration).
+
+- :ref:`optimize-rendering` -- Speed up DAG parsing by choosing the right LoadMode, reducing DAG granularity, and skipping stale sources.
+- :ref:`optimize-execution` -- Speed up DAG runs by choosing the right execution mode, sizing workers, and reducing per-task overhead.
+- :ref:`perf-troubleshooting` -- Diagnose common performance issues such as slow parsing, missing DAGs, and Out of Memory (OOM) errors.
+
+The following pages cover specific optimization mechanisms in more detail:
+
+- :ref:`memory-optimization` -- Reduce memory consumption during DAG parsing and task execution.
+- :ref:`caching` -- How Cosmos caches dbt ls output, partial parse files, profiles, and YAML selectors.
+- :ref:`invocation-mode` -- Choose between running dbt as a library or as a subprocess.
 
 .. toctree::
    :maxdepth: 1
    :caption: Optimize Performance
 
+   optimize_rendering
+   optimize_execution
+   troubleshooting
    memory_optimization
    caching
    invocation_mode
@@ -0,0 +1,171 @@
+.. _optimize-execution:
+
+Optimize Task Execution
+-----------------------
+
+Once your DAG is parsed, performance depends on how quickly tasks execute. Each Cosmos task runs one or more dbt
+commands, and the overhead of those invocations adds up across a DAG run. This page covers the most impactful ways
+to reduce DAG run time.
+
+
+1. Use an efficient execution mode
++++++++++++++++++++++++++++++++++++
+
+The execution mode determines how Cosmos runs dbt commands at task execution time. Choosing the right mode is the
+single most impactful change for DAG run performance.
+
+**Recommended: use** ``ExecutionMode.WATCHER``
+
+In the default ``ExecutionMode.LOCAL``, every model runs as a separate ``dbt run`` invocation, which introduces
+per-task overhead. ``ExecutionMode.WATCHER`` runs a single ``dbt build`` across the entire project and uses
+dbt's native threading to parallelize models, while still giving you model-level visibility in Airflow.
+
+Benchmarks show **up to 80% reduction in DAG run time** compared to ``ExecutionMode.LOCAL``.
+See :ref:`watcher-execution-mode` for setup instructions and detailed benchmarks.
+
+.. note::
+
+   ``ExecutionMode.WATCHER`` is currently experimental. Review its
+   `known limitations <https://astronomer.github.io/astronomer-cosmos/guides/run_dbt/airflow-worker/watcher-execution-mode.html#known-limitations>`_
+   before adopting it in production.
+
+.. code-block:: python
+
+   from cosmos import DbtDag, ExecutionConfig
+   from cosmos.constants import ExecutionMode
+
+   DbtDag(
+       dag_id="my_dbt_dag",
+       execution_config=ExecutionConfig(
+           execution_mode=ExecutionMode.WATCHER,
+       ),
+       # ...
+   )
+
+.. tip::
+
+   ``ExecutionMode.WATCHER`` performance scales with dbt ``threads``. Start with a conservative value that matches
+   the CPU capacity of your workers, then increase gradually. See
+   :ref:`watcher-execution-mode` for how to configure threads.
+
+**Alternative for BigQuery: use** ``ExecutionMode.AIRFLOW_ASYNC``
+
+If you use BigQuery, ``ExecutionMode.AIRFLOW_ASYNC`` pre-compiles SQL transformations and executes them using
+Airflow's deferrable ``BigQueryInsertJobOperator``. This frees worker slots while queries execute in BigQuery,
+achieving roughly **35% faster execution** compared to ``ExecutionMode.LOCAL``. See :ref:`async-execution-mode`.
+
+**Baseline: use** ``ExecutionMode.LOCAL`` **with** ``InvocationMode.DBT_RUNNER``
+
+If neither ``WATCHER`` nor ``AIRFLOW_ASYNC`` suits your setup, configure ``ExecutionMode.LOCAL`` to use
+``InvocationMode.DBT_RUNNER`` to run dbt as a library call rather than spawning a subprocess for each task.
+Since Cosmos 1.4, ``DBT_RUNNER`` is the preferred invocation mode and is auto-selected when dbt is available in
+the same Python environment; otherwise Cosmos falls back to ``InvocationMode.SUBPROCESS``. See :ref:`invocation-mode`.
+
+
+2. Install dbt in the same Python environment as Airflow
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+When dbt is installed alongside Airflow, Cosmos uses dbt's programmatic API (``dbtRunner``) instead of spawning
+subprocesses. This eliminates process creation overhead and reduces both CPU and memory usage during task execution
+and DAG parsing.
+
+This is required for ``InvocationMode.DBT_RUNNER`` and yields the best performance with ``ExecutionMode.WATCHER``.
+
+For more details, see :ref:`invocation-mode`.
+
+
+3. Pre-install dbt dependencies
++++++++++++++++++++++++++++++++
+
+By default, Cosmos runs ``dbt deps`` during both DAG parsing and task execution to ensure packages are available.
+For large projects with many packages, this adds significant overhead to every task.
+
+**Pre-install packages in your Docker image:**
+
+.. code-block:: docker
+
+   # In your Dockerfile
+   COPY dbt_project/ /opt/dbt/project/
+   RUN cd /opt/dbt/project && dbt deps
+
+**Then disable runtime installation:**
+
+.. code-block:: python
+
+   from cosmos import ProjectConfig
+
+   ProjectConfig(
+       dbt_project_path="/opt/dbt/project",
+       install_dbt_deps=False,
+   )
+
+
+4. Use profiles.yml instead of profile mapping
++++++++++++++++++++++++++++++++++++++++++++++++
+
+Cosmos can generate dbt profiles at runtime from Airflow connections using profile mapping classes. While convenient,
+this adds overhead to each task invocation because Cosmos must read the Airflow connection and construct the profile.
+
+If performance is a priority, provide a ``profiles.yml`` file directly. This avoids the runtime profile generation
+entirely.
+
+For how to configure this, see
+`Using your profiles.yml <https://astronomer.github.io/astronomer-cosmos/guides/connect_database/use-your-profiles-yml.html>`_.
+
+
+5. Worker node sizing
++++++++++++++++++++++
+
+The adequate resources needed to run Cosmos tasks depend on your dbt project and the Cosmos configuration chosen.
+Airflow workers configured for IO-intensive workloads (the default on Astro, which uses a ratio of 5 concurrent
+processes per vCPU) may not have enough CPU capacity for Cosmos tasks, which involve parsing dbt projects and running
+dbt commands.
+
+The following table provides recommended concurrency ratios based on execution mode:
+
+.. list-table:: Recommended worker concurrency (concurrent processes per vCPU)
+   :header-rows: 1
+   :widths: 50 20
+
+   * - Execution mode
+     - Ratio
+   * - ``ExecutionMode.LOCAL`` with dbt in the same Python environment
+     - 2:1
+   * - ``ExecutionMode.LOCAL`` with dbt in a separate virtual environment
+     - 1:1
+   * - ``ExecutionMode.AIRFLOW_ASYNC`` (BigQuery)
+     - 4:1
+
+.. note::
+
+   Keep in mind that Airflow re-parses the DAG file on the worker node every time a task runs. If you are using
+   ``LoadMode.DBT_LS``, this means each task also triggers a dbt project parse. Consider using
+   ``LoadMode.DBT_MANIFEST`` to reduce worker-side parsing overhead. See :ref:`optimize-rendering`.
+
+If you are using ``ExecutionMode.WATCHER``, the producer task is CPU and memory intensive while the consumer sensor
+tasks are lightweight. Use the ``watcher_dbt_execution_queue``
+`configuration <https://astronomer.github.io/astronomer-cosmos/guides/run_dbt/airflow-worker/watcher-execution-mode.html#watcher-dbt-execution-queue>`_
+to route the producer task and sensor retries to a worker queue with more resources.
+
+
+6. Profile memory usage with debug mode
+++++++++++++++++++++++++++++++++++++++++
+
+To right-size your workers, enable Cosmos debug mode to measure actual memory consumption per task:
+
+.. code-block:: bash
+
+   export AIRFLOW__COSMOS__ENABLE_DEBUG_MODE=True
+
+When enabled, Cosmos tracks peak memory usage during task execution and pushes it to XCom under the key
+``cosmos_debug_max_memory_mb``. Use this data to:
+
+- Identify which tasks consume the most memory
+- Set appropriate memory limits and worker queue assignments
+- Detect memory regressions over time
+
+For high-memory tasks, consider using separate
+`Airflow pools <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/pools.html>`_
+or the ``watcher_dbt_execution_queue`` configuration to route them to workers with more resources.
+
+For more memory optimization strategies, see :ref:`memory-optimization`.
@@ -0,0 +1,133 @@
+.. _optimize-rendering:
+
+Optimize DAG Parsing
+--------------------
+
+Every time Airflow parses a DAG file that contains a ``DbtDag`` or ``DbtTaskGroup``, Cosmos must load and process the
+dbt project to build the corresponding Airflow task graph. The time this takes directly affects how quickly your DAGs
+appear and update in Airflow. This page covers the most impactful ways to reduce that parse time.
+
+.. tip::
+
+   Cosmos logs the time it takes to parse each dbt project at the ``INFO`` level:
+
+   .. code-block:: text
+
+      Cosmos performance (<cache_id>) - [<hostname>|<pid>]: It took 0.068s to parse the dbt project for DAG using LoadMode.DBT_LS_CACHE
+
+   Search your Airflow scheduler or DAG processor logs for ``Cosmos performance`` to measure your current parse time.
+
+
+1. Choose the right LoadMode
+++++++++++++++++++++++++++++
+
+The ``LoadMode`` controls how Cosmos reads your dbt project. It is the single most impactful setting for parse-time
+performance.
+
+**Recommended: use** ``LoadMode.DBT_MANIFEST``
+
+Parsing a pre-compiled ``manifest.json`` is the fastest option because it avoids running any dbt command at parse time.
+
+.. code-block:: python
+
+   from cosmos import DbtDag, ProjectConfig, RenderConfig
+   from cosmos.constants import LoadMode
+
+   DbtDag(
+       dag_id="my_dbt_dag",
+       project_config=ProjectConfig(
+           dbt_project_path="/path/to/dbt/project",
+           manifest_path="/path/to/dbt/project/target/manifest.json",
+       ),
+       render_config=RenderConfig(
+           load_method=LoadMode.DBT_MANIFEST,
+       ),
+       # ...
+   )
+
+To generate the manifest, run the following from your dbt project directory (typically as part of CI/CD):
+
+.. code-block:: bash
+
+   dbt deps    # install packages first
+   dbt compile # generates target/manifest.json
+
+Then make the resulting ``target/manifest.json`` available to Cosmos via a local path.
+
+For more details on all parsing methods, see :ref:`parsing-methods`.
+
+**If you cannot pre-compute the manifest**
+
+Use ``LoadMode.DBT_LS`` with the following optimizations to minimize parse-time overhead:
+
+- **Enable caching** (on by default since Cosmos 1.5) so that ``dbt ls`` output is reused across parses. See :ref:`caching`.
+- **Use** ``InvocationMode.DBT_RUNNER`` (default since Cosmos 1.9) to run ``dbt ls`` as a library call instead of a subprocess. See :ref:`invocation-mode`.
+- **Keep partial parsing enabled** (on by default) so dbt skips re-parsing unchanged project files. See :ref:`partial-parsing`.
+- **Pre-install dbt packages** in your Docker image or CI and disable runtime installation:
+
+  .. code-block:: python
+
+     ProjectConfig(
+         dbt_project_path="/path/to/dbt/project",
+         install_dbt_deps=False,  # skip dbt deps at parse time
+     )
+
+  This avoids running ``dbt deps`` on every DAG parse, which can be slow when packages need to be fetched.
+
+
+2. Reduce DAG granularity
++++++++++++++++++++++++++
+
+Fewer nodes in the Airflow DAG means faster parsing. There are two ways to reduce the number of nodes Cosmos generates.
+
+**Select only the nodes you need**
+
+Use ``select``, ``exclude``, or ``selector`` in ``RenderConfig`` to limit which dbt nodes are included in the DAG.
+For example, to run only models tagged ``daily``:
+
+.. code-block:: python
+
+   RenderConfig(
+       select=["tag:daily"],
+   )
+
+For the full selection syntax, see :ref:`selecting-excluding`.
+
+**Choose an efficient TestBehavior**
+
+The default ``TestBehavior.AFTER_EACH`` creates a separate test task after every model, which can significantly
+increase the number of tasks in the DAG. Consider these alternatives:
+
+- ``TestBehavior.NONE`` -- no test tasks are created. Use this if tests are not needed or are run separately.
+- ``TestBehavior.BUILD`` -- tests run as part of the model task itself (via ``dbt build``), so no additional tasks are created.
+- ``TestBehavior.AFTER_ALL`` -- a single test task runs after all models complete.
+
+.. code-block:: python
+
+   from cosmos.constants import TestBehavior
+
+   RenderConfig(
+       test_behavior=TestBehavior.BUILD,
+   )
+
+
+3. Skip stale sources
++++++++++++++++++++++
+
+If your DAG includes multiple data sources and some may not have fresh data, you can avoid running unnecessary
+branches by rendering source nodes with freshness checks. When a source is not fresh, the downstream branch can be
+skipped.
+
+To enable this, configure ``source_rendering_behavior`` in ``RenderConfig`` and customize the source node behavior
+using ``node_converters``:
+
+.. code-block:: python
+
+   from cosmos.constants import SourceRenderingBehavior
+
+   RenderConfig(
+       source_rendering_behavior=SourceRenderingBehavior.WITH_TESTS_OR_FRESHNESS,
+   )
+
+For details on source rendering and how to customize source node behavior, see :ref:`managing-sources` and
+:ref:`dag_customization`.