Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 32 additions & 2 deletions docs/guides/dbt_docs/generating-docs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@ After generating the dbt docs, you can host them natively within Airflow via the

Alternatively, many users choose to serve these docs on a separate static website. This is a great way to share your data models with a broad array of stakeholders.

Cosmos offers two pre-built ways of generating and uploading dbt docs and a fallback option to run custom code after the docs are generated:
Cosmos offers pre-built ways of generating and uploading dbt docs, plus a fallback option to run custom code after the docs are generated:

- :class:`~cosmos.operators.DbtDocsS3Operator`: generates and uploads docs to a S3 bucket.
- :class:`~cosmos.operators.kubernetes.DbtDocsS3KubernetesOperator` (introduced in Cosmos 1.15.0): generates docs in a Kubernetes Pod and uploads them to an S3 bucket from inside that Pod.
- :class:`~cosmos.operators.DbtDocsAzureStorageOperator`: generates and uploads docs to an Azure Blob Storage.
- :class:`~cosmos.operators.DbtDocsGCSOperator`: generates and uploads docs to a GCS bucket.
- :class:`~cosmos.operators.DbtDocsOperator`: generates docs and runs a custom callback.

The first three operators require you to have a connection to the target storage. The last operator allows you to run custom code after the docs are generated in order to upload them to a storage of your choice.
The first four operators require you to have a connection to the target storage. The last operator allows you to run custom code after the docs are generated in order to upload them to a storage of your choice.


Examples
Expand All @@ -43,6 +44,35 @@ You can use the :class:`~cosmos.operators.DbtDocsS3Operator` to generate and upl
bucket_name="test_bucket",
)

Upload to S3 from Kubernetes
''''''''''''''''''''''''''''

Comment thread
tatiana marked this conversation as resolved.
.. versionadded:: 1.15.0

If you run dbt in :ref:`kubernetes`, use :class:`~cosmos.operators.kubernetes.DbtDocsS3KubernetesOperator`.
Unlike the local S3 operator, this operator generates the docs and uploads them to S3 from inside the Kubernetes Pod.

This is important because the dbt ``target`` directory is created inside the Pod, not on the Airflow worker that launched it.

Requirements specific to Kubernetes:

- The container image must include your dbt project files.
- The container image or mounted files must include a ``profiles.yml`` file, because Kubernetes execution mode does not support :doc:`../connect_database/use-profile-mapping`.
- The container image must have the AWS CLI available because Cosmos uploads the generated docs with ``aws s3 sync``.
- The Pod still needs the database credentials and any other secrets required to run ``dbt docs generate``.

The following example extends the Kubernetes example DAG and uploads the generated docs to S3:

.. literalinclude:: ../../../dev/dags/jaffle_shop_kubernetes.py
:language: python
:start-after: [START kubernetes_docs_to_s3_example]
:end-before: [END kubernetes_docs_to_s3_example]

The ``connection_id`` is resolved from Airflow and translated into AWS environment variables that are injected into the Pod before ``aws s3 sync`` runs.

.. note::
This Kubernetes integration currently supports S3 only. If you need another storage backend, use one of the local operators or extend Cosmos with another Kubernetes docs operator.

Upload to Azure Blob Storage
''''''''''''''''''''''''''''

Expand Down
9 changes: 7 additions & 2 deletions docs/guides/run_dbt/container/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ At the moment, the user is expected to add to the Docker image both:
- The dbt Profile, which contains the information for dbt to access the database while parsing the project from Apache Airflow nodes
- Handle secrets

If you plan to generate dbt docs and upload them to S3 from Kubernetes, the image also needs the AWS CLI because Cosmos performs the upload from inside the Pod.

Additional KubernetesPodOperator parameters can be added to the ``operator_args`` parameter of the ``DbtKubernetesOperator``.

For instance,
Expand All @@ -47,6 +49,9 @@ For instance,
:start-after: [START kubernetes_tg_example]
:end-before: [END kubernetes_tg_example]

To generate dbt docs and upload them to S3 from the same Pod, use :class:`~cosmos.operators.kubernetes.DbtDocsS3KubernetesOperator` and Cosmos 1.15.0 or higher.
See :doc:`../../dbt_docs/generating-docs` for an end-to-end example and the extra requirements for this workflow.

Step-by-step instructions
+++++++++++++++++++++++++

Expand Down Expand Up @@ -175,7 +180,7 @@ The Kubernetes execution mode has the following limitations:
- Does not emit Airflow datasets, assets, and dataset aliases (there is an `open ticket #2329 <https://github.com/astronomer/astronomer-cosmos/issues/2329>`__ to address this)
- Does not handle installing dbt deps for users (there is an `open ticket #679 <https://github.com/astronomer/astronomer-cosmos/issues/679>`__ to address this)
- Does not support `ProfileMapping <https://astronomer.github.io/astronomer-cosmos/guides/connect_database/use-profile-mapping.html>`_ (there is an `open ticket #749 <https://github.com/astronomer/astronomer-cosmos/issues/749>`__ to address this)
- Does not support `Callbacks <https://astronomer.github.io/astronomer-cosmos/guides/callbacks/callbacks.html>`_ (there is an `open ticket #1575 <https://github.com/astronomer/astronomer-cosmos/issues/1575>`__ to address this)
- Does not support :doc:`../callbacks/callbacks` (there is an `open ticket #1575 <https://github.com/astronomer/astronomer-cosmos/issues/1575>`__ to address this)
- Does not expose Compiled SQL as a `templated field <https://astronomer.github.io/astronomer-cosmos/guides/cosmos_devex/compiled-sql.html>`_
- Does not benefit from `Cosmos caching mechanisms <https://astronomer.github.io/astronomer-cosmos/optimize_performance/caching.html>`_
- Does not support `generating dbt docs & uploading to an object store <https://astronomer.github.io/astronomer-cosmos/guides/dbt_docs/generating-docs.html>`_ (there is a `PR <https://github.com/astronomer/astronomer-cosmos/pull/2058>`_ to solve this for S3)
- Since 1.15.0, supports generating dbt docs and uploading them to S3 with :class:`~cosmos.operators.kubernetes.DbtDocsS3KubernetesOperator`; other object stores and callback-based uploads remain unsupported in Kubernetes execution mode
Loading