-
Notifications
You must be signed in to change notification settings - Fork 297
Improve glossary: fix structure, add Cosmos-specific terms #2618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
2f14021
Add to sidebar
lzdanski 6dc5418
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 659f9df
Fix misspelling
lzdanski b599fff
Apply suggestion from @pankajastro
pankajastro 50d52ec
Improve glossary: fix directive, add DbtDag and DbtTaskGroup entries
pankajastro ce52c8e
Fix undefined labels: point DbtDag/DbtTaskGroup refs to core-concepts
pankajastro a641455
Merge branch 'main' into docs/glossary-dbtdag-dbttaskgroup
pankajastro 1c779a2
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 69380c4
Apply suggestion from @Copilot
pankajastro d2f1a9b
Address review feedback on glossary entries
pankajastro 0854a86
Merge branch 'main' into docs/glossary-dbtdag-dbttaskgroup
pankajastro File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,62 +1,115 @@ | ||
| .. glossary:: | ||
|
|
||
| Glossary | ||
| ========= | ||
|
|
||
| Cosmos | ||
| An open-source Python package that allows you to write data transformations using dbt | ||
| and then use Apache Airflow®'s orchestration to integrate dbt projects into end-to-end workflows. | ||
|
|
||
| Dag | ||
| An Airflow term derived from the mathematical structure called **Directed Acyclic Graph**. The Dag provides | ||
| a model that includes everything needed to execute an Airflow workflow. See `Dag <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html>`_. | ||
|
|
||
| dbt deps | ||
| A `dbt command <https://docs.getdbt.com/reference/commands/deps?version=1.12>`_ that pulls the most recent version of dependencies contained in your packages.yml file. | ||
|
|
||
| Execution mode | ||
| The execution mode describes where and how Cosmos runs dbt commands. You configure it using the ``ExecutionConfig``. See :ref:`execution-modes`. | ||
|
|
||
| Load mode | ||
| The method that Cosmos uses to parse your dbt project. See :ref:`parsing-methods`. | ||
|
|
||
| Manifest | ||
| A dbt artifact that This file contains a full representation of the dbt project's resources, including all node configurations and resource properties. | ||
| See `Manifest JSON file <https://docs.getdbt.com/reference/artifacts/manifest-json?version=1.12>`_ in the dbt docs. | ||
|
|
||
| Node | ||
| A dbt concept that encapsulates a step within a pipeline. | ||
| ======== | ||
|
|
||
| Partial parsing | ||
| Partial parsing is a dbt feature that can greatly speed up dbt parsing and execution when using the ``dbt_ls`` | ||
| load mode. See :ref:`partial-parsing`. | ||
|
|
||
| Profile | ||
| The authentication information used by dbt to connect to your data warehouse. See `profile <https://docs.getdbt.com/reference/project-configs/profile?version=1.12>`_. | ||
|
|
||
| ProfileConfig | ||
| The class that determines which data warehouse Cosmos connects to when executing the dbt SQL. See :ref:`connect_database`. | ||
|
|
||
| Profile mapping | ||
| A Cosmos-provided resource that translate Airflow connections into dbt profiles. See :ref:`use-profile-mapping`. | ||
|
|
||
| profiles.yml | ||
| The file where dbt stores connection information for each data warehouse connection. See :ref:`connect_database`, :ref:`use-your-profiles-yml`, | ||
| or `The profiles.yml file <https://docs.getdbt.com/docs/local/connect-data-platform/connection-profiles?version=1.12#about-the-profilesyml-file>`_ in the dbt docs. | ||
|
|
||
| ProjectConfig | ||
| The Cosmos class that allows you to specify information about where your dbt project is located and project variables that should be used for rendering and execution. | ||
|
|
||
| RenderConfig | ||
| The configuration in Cosmos that controls how Cosmos turns your dbt project into an Airflow dag or task group. :ref:`render-config`. | ||
|
|
||
| Task | ||
| An Airflow term describing a step within a pipeline. See `Task <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html>`_. | ||
|
|
||
| Test | ||
| Refers to the the `dbt test command <https://docs.getdbt.com/reference/commands/test?version=1.12>`_. You can configure how you want | ||
| Cosmos to handle running ``dbt test`` with :ref:`testing-behavior`. | ||
| .. glossary:: | ||
|
|
||
| Workflow | ||
| A dbt term describing a pipeline that contains a group of steps. dbt can run a subset | ||
| of tasks assuming upstream tasks were run. This is similar to the Airflow concept of a `Dag <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html>`_. | ||
| Cosmos | ||
| An open-source Python package that allows you to write data transformations using dbt | ||
| and then use Apache Airflow®'s orchestration to integrate dbt projects into end-to-end workflows. | ||
|
|
||
| DAG | ||
| An Airflow term derived from the mathematical structure called **Directed Acyclic Graph**. A DAG | ||
| provides a model that includes everything needed to execute an Airflow workflow. | ||
| See `DAG <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html>`_. | ||
|
|
||
| DbtDag | ||
| A Cosmos class that wraps a full Airflow DAG around a dbt project, rendering each dbt node as an | ||
| Airflow task. Use ``DbtDag`` when the entire Airflow DAG is dedicated to running a dbt project. | ||
| See :ref:`core-concepts`. | ||
|
|
||
| DbtTaskGroup | ||
| A Cosmos class that embeds a dbt project as an Airflow ``TaskGroup`` inside a larger DAG. Use | ||
| ``DbtTaskGroup`` when dbt is one stage of a broader pipeline (e.g. ingestion → dbt → reporting). | ||
| See :ref:`core-concepts`. | ||
|
|
||
| dbt deps | ||
| A `dbt command <https://docs.getdbt.com/reference/commands/deps>`_ that pulls the most recent | ||
| version of dependencies listed in your ``packages.yml`` file. | ||
|
|
||
| Callbacks | ||
| User-defined functions that Cosmos invokes after the dbt command completes, during | ||
| post-execution handling and before the temporary project directory is cleaned up. | ||
| Useful for custom logging, alerting, or side effects without modifying the operator. | ||
| Configure them on Cosmos operators via ``callback`` and ``callback_args``; when using | ||
| ``DbtDag`` or ``DbtTaskGroup``, these can be passed through ``operator_args``. | ||
|
pankajastro marked this conversation as resolved.
|
||
| See :ref:`callbacks`. | ||
|
|
||
| ExecutionConfig | ||
| The Cosmos class used to configure the execution mode and related runtime options such as the | ||
| dbt executable path and invocation mode. See :ref:`execution-config`. | ||
|
|
||
| ExecutionMode | ||
| Describes where and how Cosmos runs dbt commands (e.g. ``LOCAL``, ``WATCHER``, ``DOCKER``). | ||
| Configured via ``ExecutionConfig``. See :ref:`execution-modes`. | ||
|
|
||
|
pankajastro marked this conversation as resolved.
|
||
| Interceptors | ||
| An optional list of callables (new in Cosmos 1.14) that run before Cosmos builds the dbt | ||
| command for each task. Each callable receives ``(context, operator)`` and may modify | ||
| ``operator.vars`` and ``operator.env``; the modified values are then used when building and | ||
| running the dbt command. Useful for injecting runtime variables or environment values per task | ||
| run. Configured via ``operator_args={"interceptors": [...]}`` on ``DbtDag`` or ``DbtTaskGroup``. | ||
| See :ref:`operator-args`. | ||
|
|
||
| InvocationMode | ||
| Controls how Cosmos calls dbt: ``DBT_RUNNER`` imports dbt as a Python library in the same | ||
|
pankajastro marked this conversation as resolved.
|
||
| process (no subprocess overhead, lower CPU and memory usage), while ``SUBPROCESS`` spawns a | ||
| separate process (better isolation when dbt and Airflow share a Python environment). | ||
| Running as a subprocess roughly doubles memory usage compared to ``DBT_RUNNER``. | ||
| See :ref:`invocation-mode`. | ||
|
pankajastro marked this conversation as resolved.
|
||
|
|
||
| LoadMode | ||
| The method Cosmos uses to parse your dbt project (e.g. ``DBT_MANIFEST``, ``DBT_LS``). | ||
| Configured via ``RenderConfig``. See :ref:`parsing-methods`. | ||
|
|
||
| Manifest | ||
| A dbt artifact that contains a full representation of the dbt project's resources, including all | ||
| node configurations and resource properties. | ||
| See `Manifest JSON file <https://docs.getdbt.com/reference/artifacts/manifest-json>`_ in the dbt docs. | ||
|
|
||
| Partial parsing | ||
| A dbt feature, enabled in Cosmos since v1.4, that skips re-parsing files that have not | ||
| changed. Each Airflow node performs a full dbt project parse only once; subsequent task runs | ||
| reuse the cached ``partial_parse.msgpack`` artifact, reducing parse time and CPU usage. | ||
| See :ref:`partial-parsing`. | ||
|
|
||
| Profile | ||
| The authentication information used by dbt to connect to your data warehouse. | ||
| See `profile <https://docs.getdbt.com/reference/project-configs/profile>`_. | ||
|
|
||
| ProfileConfig | ||
| The Cosmos class that determines which data warehouse Cosmos connects to when executing dbt SQL. | ||
| See :ref:`connect_database`. | ||
|
|
||
| Profile mapping | ||
| A Cosmos-provided resource that translates Airflow connections into dbt profiles. | ||
| See :ref:`use-profile-mapping`. | ||
|
|
||
| profiles.yml | ||
| The file where dbt stores connection information for each data warehouse connection. | ||
| See :ref:`connect_database`, :ref:`use-your-profiles-yml`, or | ||
| `The profiles.yml file <https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles>`_ | ||
| in the dbt docs. | ||
|
|
||
| ProjectConfig | ||
| The Cosmos class used to specify where your dbt project is located and any project variables | ||
| that should be used for rendering and execution. | ||
|
|
||
| node_converters | ||
| A ``RenderConfig`` option that accepts a dictionary mapping a ``DbtResourceType`` to a | ||
| callable, allowing users to replace how specific node types are rendered as Airflow tasks. | ||
| See :ref:`render-config`. | ||
|
|
||
| RenderConfig | ||
| The Cosmos class that controls how a dbt project is turned into an Airflow DAG or TaskGroup, | ||
| including node selection, test behaviour, and source rendering. See :ref:`render-config`. | ||
|
|
||
| SourceRenderingBehavior | ||
| Controls whether dbt source nodes are rendered as Airflow tasks. ``NONE`` (default) skips | ||
| source nodes entirely; ``ALL`` renders every source; ``WITH_TESTS_OR_FRESHNESS`` renders only | ||
| sources that have tests or a freshness check defined. Configured via ``RenderConfig``. | ||
|
|
||
| TestBehavior | ||
| Controls when and how Cosmos runs dbt tests. ``AFTER_EACH`` (default) adds a test task after | ||
| each model; ``AFTER_ALL`` runs all tests in a single task at the end; ``BUILD`` runs tests as | ||
| part of ``dbt build`` inside the model task; ``NONE`` skips tests entirely. | ||
| Configured via ``RenderConfig``. See :ref:`testing-behavior`. | ||
|
pankajastro marked this conversation as resolved.
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.