Fix main branch failing tests#2296
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses test failures and compatibility issues in PR #2261 by implementing minimal changes to ensure all checks pass. The changes focus on fixing integration test failures, adding Airflow 3 compatibility, and ensuring proper YAML selector support for LoadMode.DBT_MANIFEST.
Changes:
- Fixed integration tests failing due to Object Storage feature unavailability in Airflow 2.7
- Added Airflow 3 compatibility by skipping incompatible DAGs
- Enabled YAML selector support for
LoadMode.DBT_MANIFESTwith comprehensive caching
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_example_dags.py | Added cosmos_manifest_selectors_example.py to minimum version requirements and Airflow 3 compatibility |
| tests/test_example_dags_no_connections.py | Added cosmos_manifest_selectors_example.py to ignored files and Airflow 3 skip logic |
| tests/test_cache.py | Renamed delete_unused_dbt_ls_cache to delete_unused_dbt_cache and added YAML selector cache tests |
| tests/dbt/test_selector.py | Added comprehensive YAML selector parsing tests |
| tests/dbt/test_graph.py | Added tests for YAML selector loading and caching functionality |
| cosmos/dbt/selector.py | Implemented YamlSelectors class for parsing manifest selectors |
| cosmos/dbt/graph.py | Added YAML selector cache methods and integrated with manifest loading |
| cosmos/config.py | Added airflow_vars_to_purge_dbt_yaml_selectors_cache parameter |
| cosmos/cache.py | Refactored cache deletion functions and added YAML selector cache support |
| cosmos/settings.py | Added enable_cache_dbt_yaml_selectors configuration |
| docs/configuration/*.rst | Updated documentation for YAML selector support |
| dev/dags/cosmos_manifest_selectors_example.py | Added example DAG demonstrating YAML selector usage |
| dev/dags/dbt/altered_jaffle_shop/selectors.yml | Added selector definitions for testing |
| dev/dags/example_cosmos_cleanup_dag.py | Added YAML selector cache cleanup functions |
Comments suppressed due to low confidence (1)
cosmos/dbt/graph.py:1
- The assertion now expects only one hash value for macOS (within a tuple), but the comment on line 2082 mentions 'versions' plural. This inconsistency should be resolved - either update the comment to match the single value or explain why only one hash is now expected compared to the previous two values.
from __future__ import annotations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
YourRoyalLinus
left a comment
There was a problem hiding this comment.
Thank you for fixing these! This LGTM.
The default keyword in the selectors.yml was an artifact from when I was testing if yaml parsing can/should support that behavior. I missed it when I was cleaning up those changes. Sorry for the headache and glad it seemed you were able to spot it quickly
Features * Support cross-referencing models across dbt projects using dbt-loom by @pankajkoti in #2271 * Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by @YourRoyalLinus in #2261 * Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with ``KubernetesPodOperator`` by @tatiana in #2207 * Add support for StarRocks profile mapping by @kurkim0661 in #2256 * Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275 * Support defining custom callbacks alongside the ``WATCHER_KUBERNETES`` callback by @johnhoran in #2307 Enhancements * Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in #2077 * Leverage Airflow ``::group::`` to group logs associated with DAG parsing by @tatiana in #2235 * Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in #2245 * Restore plain text output when using ``ExecutionMode.WATCHER`` by @tiovader in #2241 Bug Fixes * Fix running empty models or ephemeral nodes in ``ExecutionMode.WATCHER`` by @tatiana in #2279 * Improve watcher producer task priority in scheduling and the UI by @tatiana in #2237 * Fix typos and formatting issues in documentation by @pankajkoti in #2259 * Allow watcher producer retries without erroring by @tatiana in #2283 * Fix ``TestBehavior.AFTER_ALL`` is missing project_name information when loading project using manifest file by @tuantran0910 in #2242 * Fix duplicate log lines in watcher subprocess execution and format timestamps by @pankajkoti in #2301 Docs * Add Watcher Kubernetes documentation by @tatiana in #2303 * Document newly added telemetry metrics in the privacy notice by @pankajkoti in #2249 * Add compatibility policy document by @pankajastro in #2251 * Improve watcher documentation related to dbt threads by @tatiana in #2273 * Fix link in watcher execution mode documentation by @jedcunningham in #2277 * Update Apache Airflow minimum compatibility policy by @tatiana in #2285 * Clarify Cosmos runtime support until "End of Basic Support" by @jedcunningham in #2286 * Update watcher docs by @tatiana in #2298 * Update watcher kubernetes documentation by @tatiana in #2306 Others * Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in #2177 * Add dbt Core 1.11 to the test matrix by @tatiana in #2230 * Add integration tests using InvocationMode.SUBPROCESS and validate output by @tatiana in #2287 * Fix main branch failing tests by @tatiana in #2296 * Update pre-commit hooks to the latest versions by @jedcunningham in #2289 * Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290 * Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and #2284 * Add Scarf metrics to understand Cosmos feature usage patterns - Add telemetry tracking for dbt docs plugin usage by @pankajkoti in #2240 - Add DAG run telemetry metrics for load mode, invocation, and render_config parameters by @pankajkoti in #2223 - Collect profile metrics for DAG runs by @pankajastro in #2228 - Compress telemetry metadata to reduce serialized DAG size by @pankajkoti in #2252 - Skip storing telemetry metadata when emission is disabled by @pankajkoti in #2278 - Hide telemetry metadata parameters from the Airflow trigger UI by @pankajkoti in #2247 closes: astronomer/oss-integrations-private#317 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Minimal changes on astronomer#2261 to see tests passing without changing the original branch I also created a PR on the @YourRoyalLinus PR: YourRoyalLinus#1 This is a summary of the main issues: (1) Some integration tests were failing with: ``` ERROR tests/test_example_dags.py - AssertionError: assert not {'/home/runner/work/astronomer-cosmos/astronomer-cosmos/dev/dags/cosmos_manifest_selectors_example.py': 'Traceback (mo...he required Object Storage feature is unavailable in Airflow version 2.7.0. Please upgrade to Airflow 2.8 or later.\n'} + where {'/home/runner/work/astronomer-cosmos/astronomer-cosmos/dev/dags/cosmos_manifest_selectors_example.py': 'Traceback (mo...he required Object Storage feature is unavailable in Airflow version 2.7.0. Please upgrade to Airflow 2.8 or later.\n'} = <airflow.models.dagbag.DagBag object at 0x7fa3c6e724d0>.import_errors ERROR tests/test_example_dags_no_connections.py - AssertionError: assert not {'/home/runner/work/astronomer-cosmos/astronomer-cosmos/dev/dags/cosmos_manifest_selectors_example.py': 'Traceback (mo...he required Object Storage feature is unavailable in Airflow version 2.7.0. Please upgrade to Airflow 2.8 or later.\n'} + where {'/home/runner/work/astronomer-cosmos/astronomer-cosmos/dev/dags/cosmos_manifest_selectors_example.py': 'Traceback (mo...he required Object Storage feature is unavailable in Airflow version 2.7.0. Please upgrade to Airflow 2.8 or later.\n'} = <airflow.models.dagbag.DagBag object at 0x7fa3be1d89d0>.import_errors ``` The reason is "Object Storage feature is unavailable in Airflow version 2.7.0. Please upgrade to Airflow 2.8 or later". This is how you can fix it: astronomer@424a9a1 (2) dbt Fusion tests stopped working because the YAML selector file defined a default selector, which did not select any nodes in the `jaffle_shop` dbt project. There were dbt Fusion tests that referenced the dbt project `jaffle_shop` that were changed and would actually run the DAG and check the DAG topology - including which nodes were rendered. Since we set the default selector to not match any nodes, this behaviour changed: ``` =========================== short test summary info ============================ FAILED tests/test_dbtf.py::test_dbt_dag_with_dbt_fusion - assert 0 == 23 + where 0 = len({}) + where {} = <cosmos.dbt.graph.DbtGraph object at 0x7f0b68bb80d0>.filtered_nodes + where <cosmos.dbt.graph.DbtGraph object at 0x7f0b68bb80d0> = <DAG: snowflake_dbt_fusion_dag>.dbt_graph ``` I moved the changes originally done to `dev/dags/dbt/jaffle_shop` to `dev/dags/dbt/altered_jaffle_shop`. I also removed the default selector definition. (3) The DAG `cosmos_manifest_selectors_examples.py` failed to run in a clean database because it would attempt to run the dbt model `customers`, without having run its upstream tasks. This can be observed in https://github.com/astronomer/astronomer-cosmos/actions/runs/21476188465/job/61860709231?pr=2296 (4) The DAG `example_cosmos_cleanup_dag.py` is not compatible with Airflow 3. I logged a follow-up ticket for us to check this: astronomer#2300. This issue was not introduced by your PR. For now, it is skipped in AF3.
- `stg_orders.sql` and `stg_payments.sql` in `altered_jaffle_shop` had a
`force_seed_dep` CTE that referenced `ref('raw_customers')` instead of
the corresponding `raw_orders` / `raw_payments` seeds — a copy-paste
leftover from #1107. The CTE exists to keep tests waiting for the
correct seed when `source_node_rendering = none`, so referencing the
wrong seed defeated its purpose.
- #1374 fixed this exact issue in `jaffle_shop` but missed
`altered_jaffle_shop`.
- The committed `target/manifest.json` snapshot already reflects the
correct edges (regenerated in #2296), so only the two SQL files needed
updating. The `cosmos_manifest_selectors_example.py` DAG that explicitly
lists `raw_orders` / `raw_payments` in its `select=` workaround is now
arguably redundant, but harmless — left as-is.
Closes #2599
- `stg_orders.sql` and `stg_payments.sql` in `altered_jaffle_shop` had a
`force_seed_dep` CTE that referenced `ref('raw_customers')` instead of
the corresponding `raw_orders` / `raw_payments` seeds — a copy-paste
leftover from #1107. The CTE exists to keep tests waiting for the
correct seed when `source_node_rendering = none`, so referencing the
wrong seed defeated its purpose.
- #1374 fixed this exact issue in `jaffle_shop` but missed
`altered_jaffle_shop`.
- The committed `target/manifest.json` snapshot already reflects the
correct edges (regenerated in #2296), so only the two SQL files needed
updating. The `cosmos_manifest_selectors_example.py` DAG that explicitly
lists `raw_orders` / `raw_payments` in its `select=` workaround is now
arguably redundant, but harmless — left as-is.
Closes #2599
Minimal changes on #2261 to see tests passing without changing the original branch
I also created a PR on the @YourRoyalLinus PR: YourRoyalLinus#1
This is a summary of the main issues:
(1) Some integration tests were failing with:
The reason is "Object Storage feature is unavailable in Airflow version 2.7.0. Please upgrade to Airflow 2.8 or later". This is how you can fix it: 424a9a1
(2) dbt Fusion tests stopped working because the YAML selector file defined a default selector, which did not select any nodes in the
jaffle_shopdbt project.There were dbt Fusion tests that referenced the dbt project
jaffle_shopthat were changed and would actually run the DAG and check the DAG topology - including which nodes were rendered.Since we set the default selector to not match any nodes, this behaviour changed:
I moved the changes originally done to
dev/dags/dbt/jaffle_shoptodev/dags/dbt/altered_jaffle_shop. I also removed the default selector definition.(3) The DAG
cosmos_manifest_selectors_examples.pyfailed to run in a clean database because it would attempt to run the dbt modelcustomers, without having run its upstream tasks. This can be observed in https://github.com/astronomer/astronomer-cosmos/actions/runs/21476188465/job/61860709231?pr=2296(4) The DAG
example_cosmos_cleanup_dag.pyis not compatible with Airflow 3. I logged a follow-up ticket for us to check this: #2300. This issue was not introduced by your PR. For now, it is skipped in AF3.