diff --git a/docs/conf.py b/docs/conf.py index 268a246c76..4a33d9c288 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -74,7 +74,7 @@ "configuration/logging": "../guides/cosmos_devex/logging.html", "configuration/memory_optimization": "../optimize_performance/memory_optimization.html", "configuration/multi-project": "../run/multi_project/multi-project.html", - "configuration/operator-args": "../guides/run_dbt/customization/operator-args.html", + "configuration/operator-args": "../guides/run_dbt/operators/operator-args.html", "configuration/parsing-methods": "../guides/translate_dbt_to_airflow/parsing-methods.html", "configuration/partial-parsing": "../guides/run_dbt/customization/partial-parsing.html", "configuration/profile-config": "../reference/configs/profile-config.html", diff --git a/docs/guides/index.rst b/docs/guides/index.rst index 353d37bd97..4462ce227e 100644 --- a/docs/guides/index.rst +++ b/docs/guides/index.rst @@ -29,11 +29,12 @@ Cosmos offers a number of configuration options to customize its behavior. For m :hidden: :caption: How Cosmos runs dbt + run_dbt/index run_dbt/execution-modes run_dbt/airflow-worker/index run_dbt/container/index run_dbt/callbacks/callbacks - run_dbt/operators/operators + run_dbt/operators/index run_dbt/customization/index .. toctree:: diff --git a/docs/guides/run_dbt/customization/index.rst b/docs/guides/run_dbt/customization/index.rst index 47c23ebe2a..5ddf50ede6 100644 --- a/docs/guides/run_dbt/customization/index.rst +++ b/docs/guides/run_dbt/customization/index.rst @@ -6,6 +6,5 @@ Additional Customization :caption: Additional Customization scheduling - operator-args partial-parsing custom-airflow-properties diff --git a/docs/guides/run_dbt/index.rst b/docs/guides/run_dbt/index.rst new file mode 100644 index 0000000000..06d405c079 --- /dev/null +++ b/docs/guides/run_dbt/index.rst @@ -0,0 +1,34 @@ +.. _how-cosmos-runs-dbt: + +How Cosmos runs dbt +=================== + +Cosmos can run dbt commands directly using operators, or, after the dbt project has been parsed and turned into an Airflow Dag or task group, you can execute it. + +In many execution modes, Cosmos’ ``DbtDag`` and ``DbtTaskGroup '' create a separate task for each dbt node (model, seed, snapshot). +This leads to improved visibility and the +possibility of fine-grained control over your dbt commands. For example, you can set task parameters like pool +or retries on individual Cosmos tasks. Or, you can make downstream tasks run as soon as a specific Cosmos task has finished successfully. +Running one dbt command per task can bring performance challenges, since each invocation of a dbt command incurs overhead. To improve performance, newer versions of Cosmos have introduced alternatives that offer the same level of granularity while centralising the execution of the dbt command in a single task. Check :ref:`watcher-execution-mode` and :ref:`async-execution-mode`, for more information. + +Cosmos uses different kinds of configurations to control how the dbt nodes are executed within the Airflow Dag or task group, which you can customize based on your project and needs. + +Execution modes +~~~~~~~~~~~~~~~~ + +Execution modes are defined by the ``ExecutionConfig`` class in your Cosmos Dag. +Depending on your specific dbt project architecture and whether you want to run your dbt commands in the cloud or in a container separate from your Airflow environment. + +Check out the available :ref:`execution-modes` and the detailed :ref:`execution-config` for more information about how to set up your Cosmos execution. + + +Running dbt commands +~~~~~~~~~~~~~~~~~~~~ + +In addition to specifying where you want Cosmos to run dbt commands, you can also configure the following: + +- :ref:`callbacks`: Tell Cosmos how to handle artifacts produced by dbt while executing dbt code. +- ``interceptor``: (new in v1.14) Optional list of callables run before building the dbt command. See :ref:`operator-args` or for more information. +- :ref:`operator-args`: Pass specific operator arguments, ``operator_args``, in your Dag that can directly correspond to dbt commands, Cosmos operations, or to define Airflow behavior. +- :ref:`scheduling`: Leverage Airflow to schedule your dbt workflows with cron-based scheduling, timetables, and data-aware scheduling. +- :ref:`partial-parsing`: Configure Cosmos to use dbt's partial parsing capabilities, improving dbt and Dag parsing, which speeds up execution times. diff --git a/docs/guides/run_dbt/operators/index.rst b/docs/guides/run_dbt/operators/index.rst new file mode 100644 index 0000000000..159beb4061 --- /dev/null +++ b/docs/guides/run_dbt/operators/index.rst @@ -0,0 +1,15 @@ + +.. _operator-index: + +Operators +========= + +Learn how to use operators with Cosmos. + +.. toctree:: + :maxdepth: 1 + :caption: Operators + + operators + operator-args + overriding-operator-args diff --git a/docs/guides/run_dbt/customization/operator-args.rst b/docs/guides/run_dbt/operators/operator-args.rst similarity index 82% rename from docs/guides/run_dbt/customization/operator-args.rst rename to docs/guides/run_dbt/operators/operator-args.rst index 8fb06e5a7b..87aaa4c38b 100644 --- a/docs/guides/run_dbt/customization/operator-args.rst +++ b/docs/guides/run_dbt/operators/operator-args.rst @@ -36,47 +36,6 @@ Example of setting a Cosmos-specific operator argument: ) -.. _operator-args-per-node: - -Overriding operator arguments per dbt node (or group of nodes) --------------------------------------------------------------- - -.. versionadded:: 1.8.0 - -Cosmos 1.8 introduced the capability for users to customise the operator arguments per dbt node, or per group of dbt nodes. -This can be done by defining the arguments via a dbt meta property alongside other dbt project configurations. - -Let's say there is a DbtTaskGroup that sets a default pool to run all the dbt tasks, but a user would like the model expensive -to run a separate pool. - -Users could either use ``operator_args`` or ``default args`` for defining the default behavior: - -.. code-block:: python - - dbt_task_group = DbtTaskGroup( - # ... - profile_config=ProfileConfig, - default_args={"pool": "default_pool"}, - ) - -While configuring in the ``dbt_project.yml`` a different behaviour for the model "expensive", that should use the "expensive-pool": - -.. code-block:: - - version: 2 - models: - - name: expensive - description: description - meta: - cosmos: - operator_kwargs: - pool: expensive-pool - - -More information about this feature can be found in :ref:`custom-airflow-properties`. - -To learn how to customise the profile per dbt model or Cosmos task, check :ref:`profile-customise-per-node`. - Summary of Cosmos-specific arguments ------------------------------------ @@ -193,15 +152,15 @@ Example usage of templated ``dbt_cmd_flags`` for microbatch models with event-ti }, ) -The following template fields are only selectable when using the operators in a standalone context (starting in Cosmos 1.4): +The following template fields are only selectable when using the operators in a standalone context via the ``operator_args`` parameter (starting in Cosmos 1.4): - ``select`` - ``exclude`` - ``selector`` - ``models`` -Since Airflow resolves template fields during Airflow DAG execution and not DAG parsing, the args above cannot be templated via ``DbtDag`` and ``DbtTaskGroup`` because both need to select dbt nodes during DAG parsing. +Since Airflow resolves template fields during Airflow DAG execution and not DAG parsing, the args above cannot be templated via ``DbtDag`` and ``DbtTaskGroup`` because both need to select dbt nodes during DAG parsing. Additionally, the SQL for compiled dbt models is stored in the template fields, which is viewable in the Airflow UI for each task run. This is provided for telemetry on task execution, and is not an operator arg. -For more information about this, see the `Compiled SQL `_ docs. +For more information about this, see the `Compiled SQL <../../cosmos_devex/compiled-sql.html>`_ docs. diff --git a/docs/guides/run_dbt/operators/operators.rst b/docs/guides/run_dbt/operators/operators.rst index 448e037e77..2168a30b19 100644 --- a/docs/guides/run_dbt/operators/operators.rst +++ b/docs/guides/run_dbt/operators/operators.rst @@ -1,10 +1,12 @@ .. _operators: -Operators -========= +dbt command operators +===================== Cosmos exposes individual operators that correspond to specific dbt commands, which can be used just like traditional -`Apache Airflow® `_ operators. Cosmos names these operators using the format ``DbtOperator``. For example, ``DbtBuildLocalOperator``. +`Apache Airflow® `_ operators. Cosmos names these operators using the format ``DbtOperator``. + +The following examples show ``DbtCloneLocalOperator`` and ``DbtSeedLocalOperator``. You can see the full ``example_operator`` Dag in the `dev/dags directory `_. Clone ----- diff --git a/docs/guides/run_dbt/operators/overriding-operator-args.rst b/docs/guides/run_dbt/operators/overriding-operator-args.rst new file mode 100644 index 0000000000..921a7c3b8a --- /dev/null +++ b/docs/guides/run_dbt/operators/overriding-operator-args.rst @@ -0,0 +1,40 @@ +.. _operator-args-per-node: + +Overriding operator arguments per dbt node (or group of nodes) +============================================================== + +.. versionadded:: 1.8.0 + +Cosmos 1.8 introduced the capability for users to customise the operator arguments per dbt node, or per group of dbt nodes. +This can be done by defining the arguments via a dbt meta property alongside other dbt project configurations. + +Let's say there is a DbtTaskGroup that sets a default pool to run all the dbt tasks, but a user would like the model expensive +to run a separate pool. + +Users could either use ``operator_args`` or ``default args`` for defining the default behavior: + +.. code-block:: python + + dbt_task_group = DbtTaskGroup( + # ... + profile_config=ProfileConfig, + default_args={"pool": "default_pool"}, + ) + +While configuring in the ``dbt_project.yml`` a different behaviour for the model "expensive", that should use the "expensive-pool": + +.. code-block:: + + version: 2 + models: + - name: expensive + description: description + meta: + cosmos: + operator_kwargs: + pool: expensive-pool + + +More information about this feature can be found in :ref:`custom-airflow-properties`. + +To learn how to customise the profile per dbt model or Cosmos task, check :ref:`profile-customise-per-node`. diff --git a/docs/reference/configs/execution-config.rst b/docs/reference/configs/execution-config.rst index 0ee5199fd9..85cf14f06a 100644 --- a/docs/reference/configs/execution-config.rst +++ b/docs/reference/configs/execution-config.rst @@ -1,3 +1,5 @@ +.. _execution-config: + Execution Config ==================