-
Notifications
You must be signed in to change notification settings - Fork 294
Add performance optimization and troubleshooting docs #2521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,28 @@ | ||
| .. _optimize-performance: | ||
|
|
||
| Optimize the performance of your Cosmos Dags | ||
| -------------------------------------------- | ||
| Optimize Performance | ||
| -------------------- | ||
|
|
||
| Cosmos performance can be tuned across two dimensions: how fast DAGs are parsed (affecting how quickly they appear | ||
| and update in Airflow) and how fast tasks execute (affecting DAG run duration). | ||
|
|
||
| - :ref:`optimize-rendering` -- Speed up DAG parsing by choosing the right LoadMode, reducing DAG granularity, and skipping stale sources. | ||
| - :ref:`optimize-execution` -- Speed up DAG runs by choosing the right execution mode, sizing workers, and reducing per-task overhead. | ||
| - :ref:`perf-troubleshooting` -- Diagnose common performance issues such as slow parsing, missing DAGs, and Out of Memory (OOM) errors. | ||
|
|
||
| The following pages cover specific optimization mechanisms in more detail: | ||
|
|
||
| - :ref:`memory-optimization` -- Reduce memory consumption during DAG parsing and task execution. | ||
| - :ref:`caching` -- How Cosmos caches dbt ls output, partial parse files, profiles, and YAML selectors. | ||
| - :ref:`invocation-mode` -- Choose between running dbt as a library or as a subprocess. | ||
|
|
||
| .. toctree:: | ||
| :maxdepth: 1 | ||
| :caption: Optimize Performance | ||
|
|
||
| optimize_rendering | ||
| optimize_execution | ||
| troubleshooting | ||
| memory_optimization | ||
| caching | ||
| invocation_mode |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,171 @@ | ||
| .. _optimize-execution: | ||
|
|
||
| Optimize Task Execution | ||
| ----------------------- | ||
|
|
||
| Once your DAG is parsed, performance depends on how quickly tasks execute. Each Cosmos task runs one or more dbt | ||
| commands, and the overhead of those invocations adds up across a DAG run. This page covers the most impactful ways | ||
| to reduce DAG run time. | ||
|
|
||
|
|
||
| 1. Use an efficient execution mode | ||
| +++++++++++++++++++++++++++++++++++ | ||
|
|
||
| The execution mode determines how Cosmos runs dbt commands at task execution time. Choosing the right mode is the | ||
| single most impactful change for DAG run performance. | ||
|
|
||
| **Recommended: use** ``ExecutionMode.WATCHER`` | ||
|
|
||
| In the default ``ExecutionMode.LOCAL``, every model runs as a separate ``dbt run`` invocation, which introduces | ||
| per-task overhead. ``ExecutionMode.WATCHER`` runs a single ``dbt build`` across the entire project and uses | ||
| dbt's native threading to parallelize models, while still giving you model-level visibility in Airflow. | ||
|
|
||
| Benchmarks show **up to 80% reduction in DAG run time** compared to ``ExecutionMode.LOCAL``. | ||
| See :ref:`watcher-execution-mode` for setup instructions and detailed benchmarks. | ||
|
|
||
| .. note:: | ||
|
|
||
| ``ExecutionMode.WATCHER`` is currently experimental. Review its | ||
| `known limitations <https://astronomer.github.io/astronomer-cosmos/guides/run_dbt/airflow-worker/watcher-execution-mode.html#known-limitations>`_ | ||
| before adopting it in production. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from cosmos import DbtDag, ExecutionConfig | ||
| from cosmos.constants import ExecutionMode | ||
|
|
||
| DbtDag( | ||
| dag_id="my_dbt_dag", | ||
| execution_config=ExecutionConfig( | ||
| execution_mode=ExecutionMode.WATCHER, | ||
| ), | ||
| # ... | ||
| ) | ||
|
|
||
| .. tip:: | ||
|
|
||
| ``ExecutionMode.WATCHER`` performance scales with dbt ``threads``. Start with a conservative value that matches | ||
| the CPU capacity of your workers, then increase gradually. See | ||
| :ref:`watcher-execution-mode` for how to configure threads. | ||
|
|
||
| **Alternative for BigQuery: use** ``ExecutionMode.AIRFLOW_ASYNC`` | ||
|
|
||
| If you use BigQuery, ``ExecutionMode.AIRFLOW_ASYNC`` pre-compiles SQL transformations and executes them using | ||
| Airflow's deferrable ``BigQueryInsertJobOperator``. This frees worker slots while queries execute in BigQuery, | ||
| achieving roughly **35% faster execution** compared to ``ExecutionMode.LOCAL``. See :ref:`async-execution-mode`. | ||
|
|
||
| **Baseline: use** ``ExecutionMode.LOCAL`` **with** ``InvocationMode.DBT_RUNNER`` | ||
|
|
||
| If neither ``WATCHER`` nor ``AIRFLOW_ASYNC`` suits your setup, configure ``ExecutionMode.LOCAL`` to use | ||
| ``InvocationMode.DBT_RUNNER`` to run dbt as a library call rather than spawning a subprocess for each task. | ||
| Since Cosmos 1.4, ``DBT_RUNNER`` is the preferred invocation mode and is auto-selected when dbt is available in | ||
| the same Python environment; otherwise Cosmos falls back to ``InvocationMode.SUBPROCESS``. See :ref:`invocation-mode`. | ||
|
|
||
|
|
||
| 2. Install dbt in the same Python environment as Airflow | ||
| ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| When dbt is installed alongside Airflow, Cosmos uses dbt's programmatic API (``dbtRunner``) instead of spawning | ||
| subprocesses. This eliminates process creation overhead and reduces both CPU and memory usage during task execution | ||
| and DAG parsing. | ||
|
|
||
| This is required for ``InvocationMode.DBT_RUNNER`` and yields the best performance with ``ExecutionMode.WATCHER``. | ||
|
|
||
| For more details, see :ref:`invocation-mode`. | ||
|
|
||
|
|
||
| 3. Pre-install dbt dependencies | ||
| +++++++++++++++++++++++++++++++ | ||
|
|
||
| By default, Cosmos runs ``dbt deps`` during both DAG parsing and task execution to ensure packages are available. | ||
| For large projects with many packages, this adds significant overhead to every task. | ||
|
|
||
| **Pre-install packages in your Docker image:** | ||
|
|
||
| .. code-block:: docker | ||
|
|
||
| # In your Dockerfile | ||
| COPY dbt_project/ /opt/dbt/project/ | ||
| RUN cd /opt/dbt/project && dbt deps | ||
|
|
||
| **Then disable runtime installation:** | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from cosmos import ProjectConfig | ||
|
|
||
| ProjectConfig( | ||
| dbt_project_path="/opt/dbt/project", | ||
| install_dbt_deps=False, | ||
| ) | ||
|
|
||
|
|
||
| 4. Use profiles.yml instead of profile mapping | ||
| +++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| Cosmos can generate dbt profiles at runtime from Airflow connections using profile mapping classes. While convenient, | ||
| this adds overhead to each task invocation because Cosmos must read the Airflow connection and construct the profile. | ||
|
|
||
| If performance is a priority, provide a ``profiles.yml`` file directly. This avoids the runtime profile generation | ||
| entirely. | ||
|
|
||
| For how to configure this, see | ||
| `Using your profiles.yml <https://astronomer.github.io/astronomer-cosmos/guides/connect_database/use-your-profiles-yml.html>`_. | ||
|
|
||
|
|
||
| 5. Worker node sizing | ||
| +++++++++++++++++++++ | ||
|
|
||
| The adequate resources needed to run Cosmos tasks depend on your dbt project and the Cosmos configuration chosen. | ||
| Airflow workers configured for IO-intensive workloads (the default on Astro, which uses a ratio of 5 concurrent | ||
| processes per vCPU) may not have enough CPU capacity for Cosmos tasks, which involve parsing dbt projects and running | ||
| dbt commands. | ||
|
|
||
| The following table provides recommended concurrency ratios based on execution mode: | ||
|
|
||
| .. list-table:: Recommended worker concurrency (concurrent processes per vCPU) | ||
| :header-rows: 1 | ||
| :widths: 50 20 | ||
|
|
||
| * - Execution mode | ||
| - Ratio | ||
| * - ``ExecutionMode.LOCAL`` with dbt in the same Python environment | ||
| - 2:1 | ||
| * - ``ExecutionMode.LOCAL`` with dbt in a separate virtual environment | ||
| - 1:1 | ||
| * - ``ExecutionMode.AIRFLOW_ASYNC`` (BigQuery) | ||
| - 4:1 | ||
|
|
||
| .. note:: | ||
|
|
||
| Keep in mind that Airflow re-parses the DAG file on the worker node every time a task runs. If you are using | ||
| ``LoadMode.DBT_LS``, this means each task also triggers a dbt project parse. Consider using | ||
| ``LoadMode.DBT_MANIFEST`` to reduce worker-side parsing overhead. See :ref:`optimize-rendering`. | ||
|
|
||
| If you are using ``ExecutionMode.WATCHER``, the producer task is CPU and memory intensive while the consumer sensor | ||
| tasks are lightweight. Use the ``watcher_dbt_execution_queue`` | ||
| `configuration <https://astronomer.github.io/astronomer-cosmos/guides/run_dbt/airflow-worker/watcher-execution-mode.html#watcher-dbt-execution-queue>`_ | ||
| to route the producer task and sensor retries to a worker queue with more resources. | ||
|
|
||
|
|
||
| 6. Profile memory usage with debug mode | ||
| ++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| To right-size your workers, enable Cosmos debug mode to measure actual memory consumption per task: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| export AIRFLOW__COSMOS__ENABLE_DEBUG_MODE=True | ||
|
|
||
| When enabled, Cosmos tracks peak memory usage during task execution and pushes it to XCom under the key | ||
| ``cosmos_debug_max_memory_mb``. Use this data to: | ||
|
|
||
| - Identify which tasks consume the most memory | ||
| - Set appropriate memory limits and worker queue assignments | ||
| - Detect memory regressions over time | ||
|
|
||
| For high-memory tasks, consider using separate | ||
| `Airflow pools <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/pools.html>`_ | ||
| or the ``watcher_dbt_execution_queue`` configuration to route them to workers with more resources. | ||
|
|
||
| For more memory optimization strategies, see :ref:`memory-optimization`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| .. _optimize-rendering: | ||
|
|
||
| Optimize DAG Parsing | ||
| -------------------- | ||
|
|
||
| Every time Airflow parses a DAG file that contains a ``DbtDag`` or ``DbtTaskGroup``, Cosmos must load and process the | ||
| dbt project to build the corresponding Airflow task graph. The time this takes directly affects how quickly your DAGs | ||
| appear and update in Airflow. This page covers the most impactful ways to reduce that parse time. | ||
|
|
||
| .. tip:: | ||
|
|
||
| Cosmos logs the time it takes to parse each dbt project at the ``INFO`` level: | ||
|
|
||
| .. code-block:: text | ||
|
|
||
| Cosmos performance (<cache_id>) - [<hostname>|<pid>]: It took 0.068s to parse the dbt project for DAG using LoadMode.DBT_LS_CACHE | ||
|
|
||
| Search your Airflow scheduler or DAG processor logs for ``Cosmos performance`` to measure your current parse time. | ||
|
|
||
|
|
||
| 1. Choose the right LoadMode | ||
| ++++++++++++++++++++++++++++ | ||
|
|
||
| The ``LoadMode`` controls how Cosmos reads your dbt project. It is the single most impactful setting for parse-time | ||
| performance. | ||
|
|
||
| **Recommended: use** ``LoadMode.DBT_MANIFEST`` | ||
|
|
||
| Parsing a pre-compiled ``manifest.json`` is the fastest option because it avoids running any dbt command at parse time. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from cosmos import DbtDag, ProjectConfig, RenderConfig | ||
| from cosmos.constants import LoadMode | ||
|
|
||
| DbtDag( | ||
| dag_id="my_dbt_dag", | ||
| project_config=ProjectConfig( | ||
| dbt_project_path="/path/to/dbt/project", | ||
| manifest_path="/path/to/dbt/project/target/manifest.json", | ||
| ), | ||
| render_config=RenderConfig( | ||
| load_method=LoadMode.DBT_MANIFEST, | ||
| ), | ||
| # ... | ||
| ) | ||
|
|
||
| To generate the manifest, run the following from your dbt project directory (typically as part of CI/CD): | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| dbt deps # install packages first | ||
| dbt compile # generates target/manifest.json | ||
|
|
||
| Then make the resulting ``target/manifest.json`` available to Cosmos via a local path. | ||
|
|
||
| For more details on all parsing methods, see :ref:`parsing-methods`. | ||
|
|
||
| **If you cannot pre-compute the manifest** | ||
|
|
||
| Use ``LoadMode.DBT_LS`` with the following optimizations to minimize parse-time overhead: | ||
|
|
||
| - **Enable caching** (on by default since Cosmos 1.5) so that ``dbt ls`` output is reused across parses. See :ref:`caching`. | ||
| - **Use** ``InvocationMode.DBT_RUNNER`` (default since Cosmos 1.9) to run ``dbt ls`` as a library call instead of a subprocess. See :ref:`invocation-mode`. | ||
| - **Keep partial parsing enabled** (on by default) so dbt skips re-parsing unchanged project files. See :ref:`partial-parsing`. | ||
| - **Pre-install dbt packages** in your Docker image or CI and disable runtime installation: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| ProjectConfig( | ||
| dbt_project_path="/path/to/dbt/project", | ||
| install_dbt_deps=False, # skip dbt deps at parse time | ||
| ) | ||
|
|
||
| This avoids running ``dbt deps`` on every DAG parse, which can be slow when packages need to be fetched. | ||
|
|
||
|
|
||
| 2. Reduce DAG granularity | ||
| +++++++++++++++++++++++++ | ||
|
|
||
| Fewer nodes in the Airflow DAG means faster parsing. There are two ways to reduce the number of nodes Cosmos generates. | ||
|
|
||
| **Select only the nodes you need** | ||
|
|
||
| Use ``select``, ``exclude``, or ``selector`` in ``RenderConfig`` to limit which dbt nodes are included in the DAG. | ||
| For example, to run only models tagged ``daily``: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| RenderConfig( | ||
| select=["tag:daily"], | ||
| ) | ||
|
|
||
| For the full selection syntax, see :ref:`selecting-excluding`. | ||
|
|
||
| **Choose an efficient TestBehavior** | ||
|
|
||
| The default ``TestBehavior.AFTER_EACH`` creates a separate test task after every model, which can significantly | ||
| increase the number of tasks in the DAG. Consider these alternatives: | ||
|
|
||
| - ``TestBehavior.NONE`` -- no test tasks are created. Use this if tests are not needed or are run separately. | ||
| - ``TestBehavior.BUILD`` -- tests run as part of the model task itself (via ``dbt build``), so no additional tasks are created. | ||
| - ``TestBehavior.AFTER_ALL`` -- a single test task runs after all models complete. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from cosmos.constants import TestBehavior | ||
|
|
||
| RenderConfig( | ||
| test_behavior=TestBehavior.BUILD, | ||
| ) | ||
|
|
||
|
|
||
| 3. Skip stale sources | ||
| +++++++++++++++++++++ | ||
|
|
||
| If your DAG includes multiple data sources and some may not have fresh data, you can avoid running unnecessary | ||
| branches by rendering source nodes with freshness checks. When a source is not fresh, the downstream branch can be | ||
| skipped. | ||
|
|
||
| To enable this, configure ``source_rendering_behavior`` in ``RenderConfig`` and customize the source node behavior | ||
| using ``node_converters``: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from cosmos.constants import SourceRenderingBehavior | ||
|
|
||
| RenderConfig( | ||
| source_rendering_behavior=SourceRenderingBehavior.WITH_TESTS_OR_FRESHNESS, | ||
| ) | ||
|
|
||
| For details on source rendering and how to customize source node behavior, see :ref:`managing-sources` and | ||
| :ref:`dag_customization`. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.