Skip to content

Add support to select using graph-operators when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST#728

Merged
tatiana merged 15 commits into
mainfrom
custom-selector-upstream-model
Dec 5, 2023
Merged

Add support to select using graph-operators when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST#728
tatiana merged 15 commits into
mainfrom
custom-selector-upstream-model

Conversation

@tatiana
Copy link
Copy Markdown
Collaborator

@tatiana tatiana commented Nov 30, 2023

Add support for the following when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST:

  • Support selection of model by name
  • Support the selection of models by name & their children (with or without degrees)
  • Support the selection of models by name & their parents (with or without degrees)
  • Support intersections and unions involving graph selectors (with or without other supported selectors, eg. tags)

Examples of select/exclusion statements that now work regardless of the LoadMode being used:

model_a
+model_b
model_c+
+model_d+
2+model_e
model_f+3
model_f+,tag:nightly

Related dbt documentation:
https://docs.getdbt.com/reference/node-selection/graph-operators
https://docs.getdbt.com/reference/node-selection/set-operators

Limitations:

  • The at operator is not supported yet (@)
  • If users opt to use graph selector, it will increase the DAG parsing time and the task execution time when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST

This PR improves and extends the original implementation proposed by @tseruga in #429. Some of the changes that were introduced on top of the original PR:

  • Add support to descendants (before only precursors were supported)
  • Add support to different depths/degrees of precursors/descendants
  • Add support to the union between graph operators and graph/non-graph operators
  • Add support to the intersection between graph operators and graph/non-graph operators

Closes: #684

Co-authored-by: Tyler Seruga t.seruga@m1finance.com

@tatiana tatiana requested a review from a team as a code owner November 30, 2023 12:54
@tatiana tatiana requested a review from a team November 30, 2023 12:54
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 30, 2023
@netlify
Copy link
Copy Markdown

netlify Bot commented Nov 30, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 8ad0648
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/656f07265aa38200082d7c9d

@dosubot dosubot Bot added area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Nov 30, 2023
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 30, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (e1f34ea) 92.99% compared to head (5fe2ccc) 93.05%.
Report is 1 commits behind head on main.

❗ Current head 5fe2ccc differs from pull request most recent head 8ad0648. Consider uploading reports for the commit 8ad0648 to get more accurate results

Files Patch % Lines
cosmos/dbt/selector.py 94.33% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #728      +/-   ##
==========================================
+ Coverage   92.99%   93.05%   +0.05%     
==========================================
  Files          55       55              
  Lines        2313     2404      +91     
==========================================
+ Hits         2151     2237      +86     
- Misses        162      167       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana added this to the 1.3.0 milestone Nov 30, 2023
@tatiana tatiana added the parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing label Nov 30, 2023
Copy link
Copy Markdown
Contributor

@harels harels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor doc string comment, lgtm

Comment thread cosmos/dbt/selector.py Outdated
Co-authored-by: Harel Shein <harel.shein@astronomer.io>
@tatiana tatiana merged commit 6deba22 into main Dec 5, 2023
@tatiana tatiana deleted the custom-selector-upstream-model branch December 5, 2023 11:19
tatiana added a commit that referenced this pull request Dec 7, 2023
Features

* Add ProfileMapping for Vertica by @perttus in #540 and #688
* Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in #649
* Add support to select using (some) graph operators when using LoadMode.CUSTOM and LoadMode.DBT_MANIFEST by @tatiana in #728
* Add cosmos/propagate_logs Airflow config support for disabling log pr… by @agreenburg in #648
* Add operator_args full_refresh as a templated field by @joppevos in #623
* Expose environment variables and dbt variables in ProjectConfig by @jbandoro in #735

Enhancements

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to dbt_packages when dbt_deps is False when using LoadMode.DBT_LS by @DanMawdsleyBA in #730
* Support no profile_config for ExecutionMode.KUBERNETES and ExecutionMode.DOCKER by @MrBones757 and @tatiana in #681 and #731
* Add aws_session_token for Athena mapping by @benjamin-awd in #663

Others

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Update conflict matrix between Airflow and dbt versions by @tatiana in #731
* Speed up integration tests by @jbandoro in #732
@tatiana tatiana mentioned this pull request Dec 7, 2023
tatiana added a commit that referenced this pull request Dec 7, 2023
Features

* Add ProfileMapping for Vertica by @perttus in #540 and #688
* Add support for Snowflake encrypted private key environment variable by @DanMawdsleyBA in #649
* Add support to select using (some) graph operators when using LoadMode.CUSTOM and LoadMode.DBT_MANIFEST by @tatiana in #728
* Add cosmos/propagate_logs Airflow config support for disabling log pr… by @agreenburg in #648
* Add operator_args full_refresh as a templated field by @joppevos in #623
* Expose environment variables and dbt variables in ProjectConfig by @jbandoro in #735

Enhancements

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to dbt_packages when dbt_deps is False when using LoadMode.DBT_LS by @DanMawdsleyBA in #730
* Support no profile_config for ExecutionMode.KUBERNETES and ExecutionMode.DOCKER by @MrBones757 and @tatiana in #681 and #731
* Add aws_session_token for Athena mapping by @benjamin-awd in #663

Others

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Update conflict matrix between Airflow and dbt versions by @tatiana in #731
* Speed up integration tests by @jbandoro in #732
jbandoro pushed a commit that referenced this pull request Dec 7, 2023
**Features**

* Add `ProfileMapping` for Snowflake encrypted private key path by
@ivanstillfront in #608
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in #649
* Add `DbtDocsGCSOperator` for uploading dbt docs to GCS by @jbandoro in
#616
* Add support to select using (some) graph operators when using
`LoadMode.CUSTOM` and `LoadMode.DBT_MANIFEST` by @tatiana in #728
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in #648
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in #623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in #735

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to `dbt_packages` when `dbt_deps` is False when
using `LoadMode.DBT_LS` by @DanMawdsleyBA in #730
* Support no `profile_config` for `ExecutionMode.KUBERNETES` and
`ExecutionMode.DOCKER` by @MrBones757 and @tatiana in #681 and #731
* Add `aws_session_token` for Athena mapping by @benjamin-awd in #663

**Others**

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Update conflict matrix between Airflow and dbt versions by @tatiana in
#731
* Speed up integration tests by @jbandoro in #732
@tatiana tatiana mentioned this pull request Jan 4, 2024
tatiana added a commit that referenced this pull request Jan 4, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in #733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in #728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in #540, #688 and #741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in #608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in #649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in #616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in #648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in #623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in #735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in #768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in #736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in #730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in #663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in #758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in #759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in #772

**Others**

* Replace flake8 for Ruff by @joppevos in #743
* Reduce code complexity to 8 by @joppevos in #738
* Speed up integration tests by @jbandoro in #732
* Fix README quickstart link in by @RNHTTR in #776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in #766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in #757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
#731 and #779
* pre-commit updates in #775, #770, #762
ykuc pushed a commit to ykuc/astronomer-cosmos that referenced this pull request Jan 11, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
…OM` or `LoadMode.DBT_MANIFEST` (astronomer#728)

Add support for the following when using `LoadMode.CUSTOM` or
`LoadMode.DBT_MANIFEST`:

* Support selection of model by name
* Support the selection of models by name & their children (with or
without degrees)
* Support the selection of models by name & their parents (with or
without degrees)
* Support intersections and unions involving graph selectors (with or
without other supported selectors, eg. tags)

Examples of select/exclusion statements that now work regardless of the
`LoadMode` being used:

```
model_a
+model_b
model_c+
+model_d+
2+model_e
model_f+3
model_f+,tag:nightly
```

Related dbt documentation:
https://docs.getdbt.com/reference/node-selection/graph-operators
https://docs.getdbt.com/reference/node-selection/set-operators

Limitations:
* The at operator is not supported yet (`@`)
* If users opt to use graph selector, it will increase the DAG parsing
time and the task execution time when using `LoadMode.CUSTOM` or
`LoadMode.DBT_MANIFEST`


This PR improves and extends the original implementation proposed by
@tseruga in astronomer#429. Some of the changes that were introduced on top of the
original PR:
* Add support to descendants (before only precursors were supported)
* Add support to different depths/degrees of precursors/descendants
* Add support to the union between graph operators and graph/non-graph
operators
* Add support to the intersection between graph operators and
graph/non-graph operators

Closes: astronomer#684

Co-authored-by: Tyler Seruga <t.seruga@m1finance.com>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
**Features**

* Add `ProfileMapping` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add `DbtDocsGCSOperator` for uploading dbt docs to GCS by @jbandoro in
astronomer#616
* Add support to select using (some) graph operators when using
`LoadMode.CUSTOM` and `LoadMode.DBT_MANIFEST` by @tatiana in astronomer#728
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to `dbt_packages` when `dbt_deps` is False when
using `LoadMode.DBT_LS` by @DanMawdsleyBA in astronomer#730
* Support no `profile_config` for `ExecutionMode.KUBERNETES` and
`ExecutionMode.DOCKER` by @MrBones757 and @tatiana in astronomer#681 and astronomer#731
* Add `aws_session_token` for Athena mapping by @benjamin-awd in astronomer#663

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731
* Speed up integration tests by @jbandoro in astronomer#732
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
**Features**

* Add new parsing method ``LoadMode.DBT_LS_FILE`` by @woogakoki in astronomer#733
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/parsing-methods.html#dbt-ls-file)).
* Add support to select using (some) graph operators when using
``LoadMode.CUSTOM`` and ``LoadMode.DBT_MANIFEST`` by @tatiana in astronomer#728
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/selecting-excluding.html#using-select-and-exclude))
* Add support for dbt ``selector`` arg for DAG parsing by @jbandoro in
astronomer#755,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#render-config)).
* Add ``ProfileMapping`` for Vertica by @perttus in astronomer#540, astronomer#688 and astronomer#741,
as
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/VerticaUserPassword.html)).
* Add ``ProfileMapping`` for Snowflake encrypted private key path by
@ivanstillfront in astronomer#608, as ([documentation](
https://astronomer.github.io/astronomer-cosmos/profiles/SnowflakeEncryptedPrivateKeyFilePem.html)).
* Add support for Snowflake encrypted private key environment variable
by @DanMawdsleyBA in astronomer#649
* Add ``DbtDocsGCSOperator`` for uploading dbt docs to GCS by @jbandoro
in astronomer#616,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html#upload-to-gcs)).
* Add cosmos/propagate_logs Airflow config support for disabling log
propagation by @agreenburg in astronomer#648,
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/logging.html)).
* Add operator_args ``full_refresh`` as a templated field by @joppevos
in astronomer#623
* Expose environment variables and dbt variables in ``ProjectConfig`` by
@jbandoro in astronomer#735
([documentation](https://astronomer.github.io/astronomer-cosmos/configuration/project-config.html#project-config-example)).
* Support disabling event tracking when using Cosmos profile mapping by
@jbandoro in astronomer#768,
([documentation](https://astronomer.github.io/astronomer-cosmos/profiles/index.html#disabling-dbt-event-tracking)).

**Enhancements**

* Make Pydantic an optional dependency by @pixie79 in astronomer#736
* Create a symbolic link to ``dbt_packages`` when ``dbt_deps`` is False
when using ``LoadMode.DBT_LS`` by @DanMawdsleyBA in astronomer#730
* Add ``aws_session_token`` for Athena mapping by @benjamin-awd in astronomer#663
* Retrieve temporary credentials from ``conn_id`` for Athena by @octiva
in astronomer#758
* Extend ``DbtDocsLocalOperator`` with static flag by @joppevos  in astronomer#759

**Bug fixes**

* Remove Pydantic upper version restriction so Cosmos can be used with
Airflow 2.8 by @jlaneve in astronomer#772

**Others**

* Replace flake8 for Ruff by @joppevos in astronomer#743
* Reduce code complexity to 8 by @joppevos in astronomer#738
* Speed up integration tests by @jbandoro in astronomer#732
* Fix README quickstart link in by @RNHTTR in astronomer#776
* Add package location to work with hatchling 1.19.0 by @jbandoro in
astronomer#761
* Fix type check error in ``DbtKubernetesBaseOperator.build_env_args``
by @jbandoro in astronomer#766
* Improve ``DBT_MANIFEST`` documentation by @dwreeves in astronomer#757
* Update conflict matrix between Airflow and dbt versions by @tatiana in
astronomer#731 and astronomer#779
* pre-commit updates in astronomer#775, astronomer#770, astronomer#762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:selector Related to selector, like DAG selector, DBT selector, etc dbt:run Primarily related to dbt run command or functionality parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support to select using (some) graph-operators when using LoadMode.CUSTOM or LoadMode.DBT_MANIFEST

3 participants