Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/getting_started/how-cosmos-works.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Cosmos creates an interface between a dbt project and Airflow, allowing you to t

You have a number of configuration options, but fundamentally, Cosmos provides the following two functions:

- **Parse your dbt project**: Cosmos parses your dbt project, and translates it into an Airflow Dag. This process uses the `ProjectConfig <../reference/configs/project-config.html>`_ and `RenderConfig <../guides/translate_dbt/render-config.html>`_ to customize specific behavior, allowing you to optimize how your dbt project is represented as a Dag.
- **Parse your dbt project**: Cosmos parses your dbt project, and translates it into an Airflow Dag. This process uses the `ProjectConfig <../reference/configs/project-config.html>`_ and `RenderConfig <../guides/translate_dbt_to_airflow/render-config.html>`_ to customize specific behavior, allowing you to optimize how your dbt project is represented as a Dag.

- **Execute the dbt commands**: Cosmos then executes the Dag, using the execution options in your `ExecutionConfig <../reference/configs/execution-config.html>`_ and `ProjectConfig <../reference/configs/project-config.html>`_ to run dbt commands with the appropriate dbt adapter, finally resulting in your dbt SQL running in your data warehouse. Cosmos uses a connection defined in the `ProfileConfig <../profiles/index.html>`_ to execute your SQL in your data warehouse.

Expand Down
4 changes: 3 additions & 1 deletion docs/guides/run_dbt/customization/partial-parsing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Starting in the 1.4 version, Cosmos tries to leverage dbt's partial parsing (``p
This feature is bound to `dbt partial parsing limitations <https://docs.getdbt.com/reference/parsing#known-limitations>`_.
As an example, ``dbt`` requires the same ``--vars``, ``--target``, ``--profile``, and ``profile.yml`` environment variables (as called by the ``env_var()`` macro) while running dbt commands, otherwise it will reparse the project from scratch.

.. _partial-parsing-profile-configuration:

Profile configuration
~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -63,7 +65,7 @@ Or environment variable:
AIRFLOW__COSMOS__CACHE_DIR="path/to/docs/here" # to override default caching directory (by default, uses the system temporary directory)
AIRFLOW__COSMOS__ENABLE_CACHE_PARTIAL_PARSE="False" # to disable caching (enabled by default)

Learn more about `caching <./caching.html>`_ and `Cosmos Airflow configurations <./cosmos-conf.html>`_.
Learn more about :doc:`caching </optimize_performance/caching>` and :doc:`Cosmos Airflow configurations </reference/configs/cosmos-conf>`.

Disabling
~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/run_dbt/operators/operator-args.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,4 +163,4 @@ Since Airflow resolves template fields during Airflow DAG execution and not DAG

Additionally, the SQL for compiled dbt models is stored in the template fields, which is viewable in the Airflow UI for each task run.
This is provided for telemetry on task execution, and is not an operator arg.
For more information about this, see the `Compiled SQL <../../cosmos_devex/compiled-sql.html>`_ docs.
For more information about this, see the :doc:`Compiled SQL </guides/cosmos_devex/compiled-sql>` docs.
6 changes: 5 additions & 1 deletion docs/guides/translate_dbt_to_airflow/parsing-methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ When you don't supply an argument to the ``load_mode`` parameter (or you supply
To use this method, you don't need to supply any additional config. This is the default.


.. _parsing-methods-dbt-manifest:

``dbt_manifest``
~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -97,6 +99,8 @@ from your project directory. If the manifest was built before a package was adde
appear in the DAG until you regenerate the manifest.


.. _parsing-methods-dbt-ls:

``dbt_ls``
~~~~~~~~~~

Expand All @@ -120,7 +124,7 @@ To use this:
)

Starting in Cosmos 1.5, Cosmos will cache the output of the ``dbt ls`` command, to improve the performance of this
parsing method. Learn more `here <./caching.html>`_.
parsing method. Learn more :doc:`here </optimize_performance/caching>`.

Since Cosmos 1.9, it will attempt to use dbt as a library, and run ``dbt ls`` using the ``dbtRunner`` that is available for `dbt programmatic invocations <https://docs.getdbt.com/reference/programmatic-invocations>`__. This mode requires dbt version 1.5.0 or higher.
This mode, named ``InvocationMode.DBT_RUNNER``, also depends on dbt being installed in the same Python virtual environment as Airflow.
Expand Down
12 changes: 6 additions & 6 deletions docs/guides/translate_dbt_to_airflow/render-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ The ``RenderConfig`` class takes the following arguments:
- ``node_conversion_by_task_group``: (new in v1.12.0) A boolean to control if node_converters are used at the task group level (ex. converting models with test_behavior=AFTER_EACH means the entire task group is converted including the run task and the test task), or the individual task level (gives more granularity for converting just the run tasks or just the test tasks). Defaults to True.
- ``dbt_executable_path``: The path to the dbt executable for dag generation. Defaults to dbt if available on the path.
- ``dbt_ls_path``: Should be set when using ``load_method=LoadMode.DBT_LS_OUTPUT``. Path of the user-managed output of ``dbt ls``.
- ``enable_mock_profile``: When using ``LoadMode.DBT_LS`` with a ``ProfileMapping`` class, by default, Cosmos mocks the values of the profile. Defaults to True. In order to leverage partial parsing, this argument should be set to ``False``. Read `Partial parsing <./partial-parsing.html#profile-configuration.html>`_ for more information.
- ``enable_mock_profile``: When using ``LoadMode.DBT_LS`` with a ``ProfileMapping`` class, by default, Cosmos mocks the values of the profile. Defaults to True. In order to leverage partial parsing, this argument should be set to ``False``. Read :ref:`Partial parsing <partial-parsing-profile-configuration>` for more information.
- ``env_vars``: (available in v1.2.5, use``ProjectConfig.env_vars`` for v1.3.0 onwards) A dictionary of environment variables for rendering. Only supported when using ``load_method=LoadMode.DBT_LS``.
- ``dbt_project_path``: Configures the DBT project location accessible on their airflow controller for DAG rendering - Required when using ``load_method=LoadMode.DBT_LS`` or ``load_method=LoadMode.CUSTOM``
- ``airflow_vars_to_purge_dbt_ls_cache``: (new in v1.5) Specify Airflow variables that will affect the ``LoadMode.DBT_LS`` cache. See `Caching <./caching.html>`_ for more information.
- ``airflow_vars_to_purge_dbt_yaml_selectors_cache``: (new in v1.13) Specify Airflow variables that will affect the YAML selectors cache when using selectors with ``LoadMode.DBT_MANIFEST``. See `Caching <./caching.html>`_ for more information.
- ``source_rendering_behavior``: Determines how source nodes are rendered when using cosmos default source node rendering (ALL, NONE, WITH_TESTS_OR_FRESHNESS). Defaults to "NONE" (since Cosmos 1.6). See `Source Nodes Rendering <./source-nodes-rendering.html>`_ for more information.
- ``source_pruning``: When set to ``True``, automatically removes (or "prunes") any dbt source nodes from your Airflow DAG that do not have any downstream dependencies within the selected portion of the dbt graph. Defaults to ``False``. See `Source Nodes Rendering <./source-nodes-rendering.html>`_ for more information.
- ``normalize_task_id``: A callable that takes a dbt node as input and returns the task ID. This function allows users to set a custom task_id independently of the model name, which can be specified as the task's display_name. This way, task_id can be modified using a user-defined function, while the model name remains as the task's display name. The display_name parameter is available in Airflow 2.9 and above. See `Task display name <./task-display-name.html>`_ for more information.
- ``airflow_vars_to_purge_dbt_ls_cache``: (new in v1.5) Specify Airflow variables that will affect the ``LoadMode.DBT_LS`` cache. See :doc:`Caching </optimize_performance/caching>` for more information.
- ``airflow_vars_to_purge_dbt_yaml_selectors_cache``: (new in v1.13) Specify Airflow variables that will affect the YAML selectors cache when using selectors with ``LoadMode.DBT_MANIFEST``. See :doc:`Caching </optimize_performance/caching>` for more information.
- ``source_rendering_behavior``: Determines how source nodes are rendered when using cosmos default source node rendering (ALL, NONE, WITH_TESTS_OR_FRESHNESS). Defaults to "NONE" (since Cosmos 1.6). See :doc:`Source Nodes Rendering <managing-sources>` for more information.
- ``source_pruning``: When set to ``True``, automatically removes (or "prunes") any dbt source nodes from your Airflow DAG that do not have any downstream dependencies within the selected portion of the dbt graph. Defaults to ``False``. See :doc:`Source Nodes Rendering <managing-sources>` for more information.
- ``normalize_task_id``: A callable that takes a dbt node as input and returns the task ID. This function allows users to set a custom task_id independently of the model name, which can be specified as the task's display_name. This way, task_id can be modified using a user-defined function, while the model name remains as the task's display name. The display_name parameter is available in Airflow 2.9 and above. See :doc:`Task display name </guides/cosmos_devex/task-display-name>` for more information.
- ``normalize_task_display_name``: This function allows users to set a custom user-defined function to alter the display name independently of the model name. This way, the task_id can be preserved while the model display name is modified.
- ``should_detach_multiple_parents_tests``: A boolean to control if tests that depend on multiple parents should be run as standalone tasks. See `Testing Behavior <testing-behavior.html>`_ for more information.
- ``enable_owner_inheritance``: (introduced in 1.10.2) A boolean to control if dbt owners should be imported as part of the airflow DAG owners. Defaults to True.
Expand Down
6 changes: 3 additions & 3 deletions docs/optimize_performance/caching.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This page explains the caching strategies in ``astronomer-cosmos`` Astronomer Co
All Cosmos caching mechanisms can be enabled or turned off in the ``airflow.cfg`` file or using environment variables.

.. note::
For more information, see `configuring a Cosmos project <./project-config.html>`_.
For more information, see :doc:`configuring a Cosmos project </reference/configs/project-config>`.

Depending on the Cosmos version, it creates a cache for three types of data:

Expand All @@ -24,7 +24,7 @@ Caching the dbt ls output

(Introduced in Cosmos 1.5)

While parsing a dbt project using `LoadMode.DBT_LS <./parsing-methods.html#dbt-ls>`_, Cosmos uses subprocess to run ``dbt ls``.
While parsing a dbt project using :ref:`LoadMode.DBT_LS <parsing-methods-dbt-ls>`, Cosmos uses subprocess to run ``dbt ls``.
This operation can be very costly; it can increase the DAG parsing times and affect not only the scheduler DAG processing but
also the tasks queueing time.

Expand Down Expand Up @@ -115,7 +115,7 @@ Caching the YAML selectors

(Introduced in Cosmos 1.13)

While parsing a dbt project using `LoadMode.DBT_MANIFEST <./parsing-methods.html#dbt-manifest>`_, if a ``selector`` argument is provided to the `RenderConfig <./render-config.html>`_ instance passed to the ``DbtDag`` or ``DbtTaskGroup``,
While parsing a dbt project using :ref:`LoadMode.DBT_MANIFEST <parsing-methods-dbt-manifest>`, if a ``selector`` argument is provided to the :doc:`RenderConfig </guides/translate_dbt_to_airflow/render-config>` instance passed to the ``DbtDag`` or ``DbtTaskGroup``,
Cosmos will parse the preprocessed YAML selectors found in the manifest. The YAML selectors will be parsed into selection criteria that Cosmos will use to filter the dbt nodes to include in the Airflow DAG. The parsed selectors will be cached to improve performance during DAG parsing.

This feature is on by default. To turn it off, export the following environment variable: ``AIRFLOW__COSMOS__ENABLE_CACHE_DBT_YAML_SELECTORS=0``.
Expand Down