Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/guides/connect_database/profile-customise-per-node.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ Let's say the user configures the profile at a ``DbtDag`` or ``DbtTaskGroup`` le

But that for a specific node or group of nodes, the user would like to replace:

* ``profile_name`` to be "non_default_profile" as opposed to "default_profile"
* ``target_name`` to be "stage" as opposed to "default_target"
* ``conn_id`` to be "non_default_connection" as opposed to "default_conn"
* ``schema`` to be "non_default_schema" as opposed to "default_schema"
- ``profile_name`` to be "non_default_profile" as opposed to "default_profile"
- ``target_name`` to be "stage" as opposed to "default_target"
- ``conn_id`` to be "non_default_connection" as opposed to "default_conn"
- ``schema`` to be "non_default_schema" as opposed to "default_schema"

They could apply this different configuration to all the project seeds by doing:

Expand Down
4 changes: 2 additions & 2 deletions docs/guides/connect_database/use-profile-mapping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ a class in Cosmos for each Airflow connection to dbt profile mapping.
You can find the available profile mappings on the left-hand side of this page. Each profile mapping is imported from
``cosmos.profiles`` and takes two arguments:

* ``conn_id``: the Airflow connection ID to use.
* ``profile_args``: a dictionary of additional arguments to pass to the dbt profile. This is useful for specifying
- ``conn_id``: the Airflow connection ID to use.
- ``profile_args``: a dictionary of additional arguments to pass to the dbt profile. This is useful for specifying
values that are not in the Airflow connection. This also acts as an override for any values that are in the Airflow
connection but should be overridden.

Expand Down
20 changes: 10 additions & 10 deletions docs/guides/cosmos_devex/lineage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,16 @@ and virtualenv execution methods (read `execution modes <../run_dbt/execution-mo

Additionally, since Cosmos uses the open-source `openlineage-integration-common <https://github.com/OpenLineage/OpenLineage/tree/main/integration/common>`_, it relies on this library to support specific dbt adapters. As of 27 December 2024, the version 1.26.0 of this package supports:

* Athena
* BigQuery
* Databricks
* DuckDB
* Dremio
* Postgres
* Redshift
* Snowflake
* Spark
* SQLServer
- Athena
- BigQuery
- Databricks
- DuckDB
- Dremio
- Postgres
- Redshift
- Snowflake
- Spark
- SQLServer

Contributions are also welcome in the `OpenLineage project <https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/dbt/processor.py#L36C1-L47C22>`_ to support more adaptors.

Expand Down
56 changes: 28 additions & 28 deletions docs/guides/run_dbt/airflow-worker/watcher-execution-mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,27 +41,27 @@ Concept: ``ExecutionMode.WATCHER``

It is built on two operator types:

* ``DbtProducerWatcherOperator`` (`#1982 <https://github.com/astronomer/astronomer-cosmos/pull/1982>`_)
- ``DbtProducerWatcherOperator`` (`#1982 <https://github.com/astronomer/astronomer-cosmos/pull/1982>`_)
Runs dbt **once** across the entire pipeline, register to `dbt event callbacks <https://docs.getdbt.com/reference/programmatic-invocations#registering-callbacks>`_ and sends model progress updates via Airflow **XComs**.

* ``DbtConsumerWatcherSensor`` (`#1998 <https://github.com/astronomer/astronomer-cosmos/pull/1998>`_)
- ``DbtConsumerWatcherSensor`` (`#1998 <https://github.com/astronomer/astronomer-cosmos/pull/1998>`_)
Watches those XComs and marks individual Airflow tasks as complete when their corresponding dbt models finish.

Together, these operators let you:

* Run dbt as a single command (for speed)
* Retain model-level observability (for clarity)
* Retry specific models (for resilience)
- Run dbt as a single command (for speed)
- Retain model-level observability (for clarity)
- Retry specific models (for resilience)

-------------------------------------------------------------------------------

Performance Gains
+++++++++++++++++

We used a dbt project developed by Google, the `google/fhir-dbt-analytics <https://github.com/google/fhir-dbt-analytics>`_ project, that interfaces with BigQuery. It contains:
* 2 seeds
* 52 sources
* 185 models
- 2 seeds
- 52 sources
- 185 models

Initial benchmarks, using illustrate significant improvements:

Expand Down Expand Up @@ -152,9 +152,9 @@ As it can be observed, the only difference with the default ``ExecutionMode.LOCA

**How it works:**

* Cosmos executes your dbt project once via a producer task.
* Model-level Airflow tasks act as watchers or sensors, updating their state as dbt completes each model.
* The DAG remains fully observable and retryable, with **dramatically improved runtime performance** (often 5× faster than ``ExecutionMode.LOCAL``).
- Cosmos executes your dbt project once via a producer task.
- Model-level Airflow tasks act as watchers or sensors, updating their state as dbt completes each model.
- The DAG remains fully observable and retryable, with **dramatically improved runtime performance** (often 5× faster than ``ExecutionMode.LOCAL``).

**How it looks like:**

Expand Down Expand Up @@ -200,9 +200,9 @@ If your Airflow DAG includes multiple stages or integrations (e.g., data ingesti

**Key advantages:**

* Integrates seamlessly into complex Airflow DAGs.
* Uses the same high-performance producer/consumer execution model.
* Each ``DbtTaskGroup`` behaves independently — allowing modular dbt runs within larger workflows.
- Integrates seamlessly into complex Airflow DAGs.
- Uses the same high-performance producer/consumer execution model.
- Each ``DbtTaskGroup`` behaves independently — allowing modular dbt runs within larger workflows.

.. image:: /_static/jaffle_shop_watcher_dbt_taskgroup_dag_run.png
:alt: Cosmos DbtDag with `ExecutionMode.WATCHER`
Expand Down Expand Up @@ -361,9 +361,9 @@ Individual dbt Operators
''''''''''''''''''''''''

The ``ExecutionMode.WATCHER`` efficiently implements the following operators:
* ``DbtSeedWatcherOperator``
* ``DbtSnapshotWatcherOperator``
* ``DbtRunWatcherOperator``
- ``DbtSeedWatcherOperator``
- ``DbtSnapshotWatcherOperator``
- ``DbtRunWatcherOperator``

However, other operators that are available in the ``ExecutionMode.LOCAL`` mode are not implemented.

Expand All @@ -373,10 +373,10 @@ Additionally, since the ``dbt build`` command does not run ``source`` nodes, the

Finally, the following features are not implemented as operators under ``ExecutionMode.WATCHER``:

* ``dbt ls``
* ``dbt run-operation``
* ``dbt docs``
* ``dbt clone``
- ``dbt ls``
- ``dbt run-operation``
- ``dbt docs``
- ``dbt clone``

You can still invoke these operators using the default ``ExecutionMode.LOCAL`` mode.

Expand Down Expand Up @@ -493,9 +493,9 @@ To override the default logic, pass a ``freshness_callback`` via ``setup_operato

**Known limitations:**

* Incompatible with ``selector`` in ``RenderConfig`` — ``--exclude`` is ignored by dbt when a YAML selector is active.
* ``dbt source freshness`` is always re-executed at runtime; ``LoadMode.DBT_MANIFEST`` freshness data is not consulted.
* Not supported for ``ExecutionMode.WATCHER_KUBERNETES``.
- Incompatible with ``selector`` in ``RenderConfig`` — ``--exclude`` is ignored by dbt when a YAML selector is active.
- ``dbt source freshness`` is always re-executed at runtime; ``LoadMode.DBT_MANIFEST`` freshness data is not consulted.
- Not supported for ``ExecutionMode.WATCHER_KUBERNETES``.

-------------------------------------------------------------------------------

Expand Down Expand Up @@ -605,10 +605,10 @@ Summary

``ExecutionMode.WATCHER`` represents a significant leap forward for running dbt in Airflow via Cosmos:

* ✅ Up to **5× faster** dbt DAG runs
* ✅ Maintains **model-level visibility** in Airflow
* ✅ Enables **smarter resource allocation**
* ✅ Built on proven Cosmos rendering techniques
- ✅ Up to **5× faster** dbt DAG runs
- ✅ Maintains **model-level visibility** in Airflow
- ✅ Enables **smarter resource allocation**
- ✅ Built on proven Cosmos rendering techniques

This is an experimental feature, and we are looking for feedback from the community.

Expand Down
22 changes: 11 additions & 11 deletions docs/guides/run_dbt/callbacks/callbacks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ alongside the artifacts created by dbt.

Many users care about those artifacts and want to perform additional actions after running the dbt command. Some examples of usage:

* Upload the artifacts to an object storage;
* Run a command after the dbt command runs, such as `montecarlo <https://docs.getmontecarlo.com/docs/dbt-core>`_; or
* Define other custom behaviours based on a specific artifact.
- Upload the artifacts to an object storage;
- Run a command after the dbt command runs, such as `montecarlo <https://docs.getmontecarlo.com/docs/dbt-core>`_; or
- Define other custom behaviours based on a specific artifact.

To support these use cases, Cosmos allows users to define functions called callbacks that can run as part of the task execution before deleting the target's folder.

Expand All @@ -24,8 +24,8 @@ These functions illustrate how to upload the generated dbt artifacts to remote c

There are two ways users can leverage using Cosmos auxiliary callback functions:

* When instantiating a Cosmos operator;
* When using ``DbtDag`` or ``DbtTaskGroup`` (users can define a callback that will apply to all tasks).
- When instantiating a Cosmos operator;
- When using ``DbtDag`` or ``DbtTaskGroup`` (users can define a callback that will apply to all tasks).


Example: Using Callbacks with a Single Operator
Expand Down Expand Up @@ -57,12 +57,12 @@ An example of how the data uploaded to GCS looks like when using ``upload_to_gcp

The path naming convention is:

* Bucket configured by the user
* Name of the DAG
* DAG Run identifier
* Task ID
* Task retry identifier
* Target folder with its contents
- Bucket configured by the user
- Name of the DAG
- DAG Run identifier
- Task ID
- Task retry identifier
- Target folder with its contents

If users are unhappy with this structure or format, they can implement similar methods, which can be based (or not) on the Cosmos standard ones.

Expand Down
10 changes: 5 additions & 5 deletions docs/guides/run_dbt/container/gcp-cloud-run-job.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ Prerequisites
5. GCP account with:
1. A GCP project (`setup guide <https://cloud.google.com/resource-manager/docs/creating-managing-projects#console>`_)
2. IAM roles:
* Basic Role: `Owner <https://cloud.google.com/iam/docs/understanding-roles#owner>`_ (control over whole project) or
* Predefined Roles: `Artifact Registry Administrator <https://cloud.google.com/iam/docs/understanding-roles#artifactregistry.admin>`_, `Cloud Run Developer <https://cloud.google.com/iam/docs/understanding-roles#run.developer>`_ (control over specific services)
- Basic Role: `Owner <https://cloud.google.com/iam/docs/understanding-roles#owner>`_ (control over whole project) or
- Predefined Roles: `Artifact Registry Administrator <https://cloud.google.com/iam/docs/understanding-roles#artifactregistry.admin>`_, `Cloud Run Developer <https://cloud.google.com/iam/docs/understanding-roles#run.developer>`_ (control over specific services)
3. Enabled service APIs:
* Artifact Registry API
* Cloud Run Admin API
* BigQuery API
- Artifact Registry API
- Cloud Run Admin API
- BigQuery API
4. A service account with BigQuery roles: `JobUser <https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser>`_ and `DataEditor <https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataEditor>`_
6. Docker image built with required dbt project and dbt DAG
7. dbt DAG with Cloud Run Job operators in the Airflow DAGs directory to run in Airflow
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ The ``ExecutionMode.WATCHER_KUBERNETES`` combines the **speed of the** :ref:`wat

This execution mode is ideal for users who:

* Want to leverage the performance benefits of the watcher execution mode
* Need to run dbt in isolated Kubernetes pods
* Prefer not to install dbt in their `Apache Airflow® <https://airflow.apache.org/>`_ deployment
- Want to leverage the performance benefits of the watcher execution mode
- Need to run dbt in isolated Kubernetes pods
- Prefer not to install dbt in their `Apache Airflow® <https://airflow.apache.org/>`_ deployment

-------------------------------------------------------------------------------

Expand Down Expand Up @@ -56,9 +56,9 @@ The following example shows how to configure a ``DbtDag`` with ``ExecutionMode.W

**Key differences from** ``ExecutionMode.KUBERNETES``:

* The ``execution_mode`` is set to ``ExecutionMode.WATCHER_KUBERNETES`` instead of ``ExecutionMode.KUBERNETES``
* The producer task runs the entire ``dbt build`` command in a single Kubernetes pod
* Consumer tasks (sensors) watch for the completion of their corresponding dbt models
- The ``execution_mode`` is set to ``ExecutionMode.WATCHER_KUBERNETES`` instead of ``ExecutionMode.KUBERNETES``
- The producer task runs the entire ``dbt build`` command in a single Kubernetes pod
- Consumer tasks (sensors) watch for the completion of their corresponding dbt models

For the complete setup including Kubernetes secrets, Docker image configuration, and profile setup, refer to the :ref:`kubernetes` documentation.

Expand All @@ -81,9 +81,9 @@ This represents approximately a **63% reduction** in total DAG runtime.

The performance improvement comes from:

* Running dbt as a single command (reducing Kubernetes pod startup overhead)
* Leveraging dbt's native threading capabilities
* Eliminating repeated dbt initialization for each model
- Running dbt as a single command (reducing Kubernetes pod startup overhead)
- Leveraging dbt's native threading capabilities
- Eliminating repeated dbt initialization for each model

-------------------------------------------------------------------------------

Expand Down Expand Up @@ -163,11 +163,11 @@ Other Inherited Limitations

The following limitations from ``ExecutionMode.WATCHER`` also apply to ``ExecutionMode.WATCHER_KUBERNETES``:

* **Individual dbt Operators**: Only ``DbtSeedWatcherKubernetesOperator``, ``DbtSnapshotWatcherKubernetesOperator``, and ``DbtRunWatcherKubernetesOperator`` are implemented. The ``DbtTestWatcherKubernetesOperator`` is currently a placeholder.
- **Individual dbt Operators**: Only ``DbtSeedWatcherKubernetesOperator``, ``DbtSnapshotWatcherKubernetesOperator``, and ``DbtRunWatcherKubernetesOperator`` are implemented. The ``DbtTestWatcherKubernetesOperator`` is currently a placeholder.

* **Test behavior**: Unlike ``ExecutionMode.WATCHER`` (which fully supports ``TestBehavior.AFTER_EACH`` since Cosmos 1.14.0), ``ExecutionMode.WATCHER_KUBERNETES`` does not yet support ``TestBehavior.AFTER_EACH``. Tests are run as part of the ``dbt build`` command by the producer task, and test tasks are rendered as ``EmptyOperator`` placeholders. This is tracked in `#1974 <https://github.com/astronomer/astronomer-cosmos/issues/1974>`_.
- **Test behavior**: Unlike ``ExecutionMode.WATCHER`` (which fully supports ``TestBehavior.AFTER_EACH`` since Cosmos 1.14.0), ``ExecutionMode.WATCHER_KUBERNETES`` does not yet support ``TestBehavior.AFTER_EACH``. Tests are run as part of the ``dbt build`` command by the producer task, and test tasks are rendered as ``EmptyOperator`` placeholders. This is tracked in `#1974 <https://github.com/astronomer/astronomer-cosmos/issues/1974>`_.

* **Source freshness nodes**: The ``dbt build`` command does not run source freshness checks.
- **Source freshness nodes**: The ``dbt build`` command does not run source freshness checks.

For more details on these limitations, refer to the :ref:`watcher-execution-mode` documentation.

Expand Down Expand Up @@ -203,9 +203,9 @@ Summary

``ExecutionMode.WATCHER_KUBERNETES`` provides:

* ✅ **~63% faster** dbt DAG runs compared to ``ExecutionMode.KUBERNETES``
* ✅ **Isolation** between dbt and Airflow dependencies
* ✅ **Model-level visibility** in Airflow
* ✅ **Easy migration** from ``ExecutionMode.KUBERNETES``
- ✅ **~63% faster** dbt DAG runs compared to ``ExecutionMode.KUBERNETES``
- ✅ **Isolation** between dbt and Airflow dependencies
- ✅ **Model-level visibility** in Airflow
- ✅ **Easy migration** from ``ExecutionMode.KUBERNETES``

This execution mode is ideal for teams who want the performance benefits of the watcher mode while maintaining the isolation provided by Kubernetes execution.
4 changes: 2 additions & 2 deletions docs/guides/run_dbt/customization/partial-parsing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Profile configuration

To respect the dbt requirement of having the same profile to benefit from partial parsing, Cosmos users should either:

* If using Cosmos profile mapping (``ProfileConfig(profile_mapping=...``), disable using mocked profile mappings by setting ``render_config=RenderConfig(enable_mock_profile=False)``
* Declare their own ``profiles.yml`` file, via ``ProfileConfig(profiles_yml_filepath=...)``
- If using Cosmos profile mapping (``ProfileConfig(profile_mapping=...``), disable using mocked profile mappings by setting ``render_config=RenderConfig(enable_mock_profile=False)``
- Declare their own ``profiles.yml`` file, via ``ProfileConfig(profiles_yml_filepath=...)``

If users don't follow these guidelines, Cosmos will use different profiles to parse the dbt project and to run tasks, and the user won't leverage dbt partial parsing.
Their logs will contain multiple ``INFO`` messages similar to the following, meaning that Cosmos is not using partial parsing:
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/run_dbt/operators/operators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Clone

Requirement

* Cosmos >= 1.8.0
* dbt-core >= 1.6.0
- Cosmos >= 1.8.0
- dbt-core >= 1.6.0

The ``DbtCloneLocalOperator`` implement `dbt clone <https://docs.getdbt.com/reference/commands/clone>`_ command.

Expand Down
Loading
Loading