Fix inclusion of package models, and selecting/excluding#2357
Conversation
✅ Deploy Preview for astronomer-cosmos ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
b44bbbf to
9a31652
Compare
9a31652 to
937befb
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where package models (e.g., from dbt_artifacts or other installed packages) could not be properly selected or excluded when using manifest load mode (LoadMode.DBT_MANIFEST). The fix adds support for the package: selector and implements dbt-like bare identifier matching.
Changes:
- Added
package:package_nameselector support for selecting/excluding nodes by package - Implemented bare identifier matching (e.g.,
exclude=['dbt_artifacts']) that matches by package_name or node name, consistent with dbt behavior - Fixed manifest parsing to correctly resolve file paths for package models vs. root project models
- Added defensive check in
get_dbt_packages_subpathto handle non-dict YAML content
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
cosmos/dbt/selector.py |
Added PACKAGE_SELECTOR constant, packages and bare_identifiers fields to SelectorConfig, implemented parsing and matching logic for package selectors and bare identifiers, refactored _handle_no_precursors_or_descendants to use dispatch pattern |
cosmos/dbt/graph.py |
Extracted manifest resource parsing into _parse_manifest_resources_to_nodes function, added logic to correctly resolve file paths for package models by distinguishing root project nodes from package nodes |
cosmos/dbt/project.py |
Added defensive isinstance check to handle cases where YAML content is not a dictionary |
tests/dbt/test_selector.py |
Added comprehensive tests for package selector (package:package_name), bare package name exclusion, and explicit package selection |
docs/configuration/selecting-excluding.rst |
Added documentation and examples for the new package: selector syntax |
docs/configuration/parsing-methods.rst |
Added note about package models requiring manifest regeneration after package installation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2357 +/- ##
==========================================
+ Coverage 97.93% 97.96% +0.02%
==========================================
Files 102 102
Lines 6926 6971 +45
==========================================
+ Hits 6783 6829 +46
+ Misses 143 142 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8717c7a to
904bd72
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tatiana
left a comment
There was a problem hiding this comment.
This looks great, @pankajkoti , thanks for fixing these two issues.
As I mentioned on our call, a side-effect of this change is that the Cosmos DAG topology may change, since we were not rendering nodes that we now are. This is sometimes a side-effect of fixing issues, but could be perceived as a breaking change by some.
For this reason, I suggest we release this in 1.14.0 instead of 1.13.1 - and that we make this very explicit in the release notes in #2314 - please, raise this in that PR.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if package_selection: | ||
| root_nodes.update( | ||
| {node_id for node_id, node in nodes.items() if node.package_name == package_selection} | ||
| ) |
There was a problem hiding this comment.
When using graph operators with an empty package selector (e.g., select=['package:+']), the current implementation silently returns no nodes instead of raising an error. For consistency with the validation in _parse_package_selector (which raises CosmosValueError for package:), consider validating that package_selection is non-empty here and raising a CosmosValueError with a clear message if it's empty. This would provide better user feedback and prevent confusion when users accidentally use an empty package selector with graph operators.
| if package_selection: | |
| root_nodes.update( | |
| {node_id for node_id, node in nodes.items() if node.package_name == package_selection} | |
| ) | |
| if not package_selection: | |
| raise CosmosValueError( | |
| "Empty package selector 'package:' is not allowed. " | |
| "Please provide a package name when using package selectors with graph operators." | |
| ) | |
| root_nodes.update( | |
| {node_id for node_id, node in nodes.items() if node.package_name == package_selection} | |
| ) |
1.14.0 (2026-04-07) --------------------- Breaking Changes * Drop support for Airflow versions earlier than **2.9** by @jedcunningham in #2288 * Fix inclusion of package models and selection/exclusion behavior by @pankajkoti in #2357 * ``ExecutionMode.WATCHER``: The per-node ``*_status`` XCom value is now a dict (``{"status": "<status>", "outlet_uris": [...]}``) instead of a plain string. Any custom code that reads these internal XCom keys directly will need to be updated by @pankajkoti in #2507 Features * Add cluster policy support for ``ExecutionMode.WATCHER`` sensor retries by @astro-anand in #2293 * Add debug mode to track memory utilization by @tatiana in #2327 * Add FQN selection support for ``LoadMode.DBT_MANIFEST`` by @pankajastro in #2375 * Introduce interceptors for Cosmos tasks by @tatiana in #2419 * Add config to allow disabling dag versioning by @pankajkoti in #2470 * Implement TaskGroups by models folder by @maximilianoarcieri and @tatiana in #1566, #2469, and #2420 * feat: implement DbtTestWatcherOperator by @michal-mrazek in #2447 * Add source freshness aware execution for ``ExecutionMode.WATCHER`` by @pankajastro and @tatiana in #2467 * Note: Like ``ExecutionMode.WATCHER``, this feature is experimental and its interface and implementation can change in the future. * Add Airflow 3.2 support by @pankajastro and @pankajkoti in #2472 Enhancements * Add watcher mode support for dbt test node states by @michal-mrazek in #2318 * Rename watcher-mode sensor retry queue and reuse it for producer tasks by @pankajastro in #2331 * Fix leaked semaphore warnings in Airflow 3 by resetting dbt adapters by @pankajkoti in #2335 * Improve dbt Fusion support and related tests by @tatiana in #2356 * Default Snowflake profile mappings to four threads by @tatiana in #2374 * Attempt to remove Pydantic as a dependency by @tatiana in #2377 * Log dbt-core and adapter versions in watcher consumer tasks by @pankajastro in #2412 * Log model errors in watcher consumer on dbt node failure by @pankajastro in #2431 * Reduce XCom read/write for tracking node state and errors in ConsumerWatcher task by @pankajastro in #2471 * Remove duplicate debug log in watcher subprocess path by @tatiana in #2494 * Simplify and unify WATCHER implementation regardless of InvocationMode by @tatiana in #2498 * Switch to lazy imports in cosmos/__init__.py by @pankajkoti in #2531 Bug Fixes * Handle invalid YAML errors with ``LoadMode.DBT_MANIFEST`` and ``RenderConfig.selector`` by @YourRoyalLinus in #2316 * Populate ``compiled_sql`` for ``InvocationMode.SUBPROCESS`` in ``ExecutionMode.WATCHER`` by @pankajkoti in #2319 * Fix select/exclude type mismatch by @tatiana in #2364 * Set ``emit_datasets=False`` for ``DbtTest*`` operators by @pankajastro in #2365 * Set correct queue priority for watcher producer tasks by @pankajastro in #2372 * Preserve ``extra_context`` for watcher consumer task instances by @pankajkoti in #2381 * Respect ``deferrable=False`` from ``operator_args`` on watcher consumer sensors by @pankajkoti in #2384 * Fix watcher queue precedence and add documentation by @pankajastro in #2391 * Do not set ``compiled_sql`` on ``ExecutionMode.WATCHER`` producers by @pankajkoti in #2440 * Remove const attribute for ``__cosmos_telemetry_metadata__`` dag param by @pankajkoti in #2466 * Remove timeout override from Cosmos watcher sensors by @tatiana and @claude in #2478 * Remove forced ``retries=0`` from watcher producer operators by @tatiana in #2479 * RFC: Add patch for newer versions of amazon provider when running dbt on EKS by @aoelvp94 in #2481 * Fix ``cosmos_debug_max_memory_mb`` XCom not pushed in Watcher sensor tasks by @tatiana in #2503 * Fix ``TestBehavior.NONE`` and ``TestBehavior.AFTER_ALL`` exclude ignored with selectors in ``ExecutionMode.WATCHER`` by @pankajkoti in #2511 * Move dataset emission for ``ExecutionMode.WATCHER`` from producer to consumer sensors by @pankajkoti in #2507 Docs * Document cluster policy configuration for ``ExecutionMode.WATCHER`` sensor tasks by @pankajastro in #2315 * Remove outdated docs for the dbt docs plugin with Airflow 3 by @pankajastro in #2353 * Make Watcher DBT Execution Queue heading clickable by @pankajastro in #2354 * Update ``ExecutionMode.WATCHER`` documentation regarding test node implementation by @jroachgolf84 in #2355 * Fix ``pre_dbt_fusion`` configuration rendering by @pankajastro in #2369 * Add documentation for including/excluding nodes based on FQN by @pankajastro in #2371 * Update watcher execution mode documentation by @tatiana in #2380 * Add documentation for ``DbtSeedLocalOperator`` by @jroachgolf84 in #2383 * Fix miscellaneous Sphinx warnings by @pankajastro in #2395 * Improve contributing documentation by @lzdanski in #2397 * Add **Get Started in 5 Minutes** guide by @lzdanski in #2398 * Add Sphinx redirects package for documentation redirects by @lzdanski in #2407 * Restructure **Getting Started** and **Guides** sections by @lzdanski in #2418 * Add open-source quickstart by @lzdanski in #2439 * Fix documentation redirects by @lzdanski in #2442 * Restructure and refactor reference documentation by @lzdanski in #2443 * Add execution modes decision documentation by @lzdanski in #2444 * Add **Core Concepts** page to Getting Started by @lzdanski in #2448 * Add guide: *How Cosmos Works* by @lzdanski in #2449 * Update **Getting Started** overview and index pages by @lzdanski in #2452 * Add guide: *How Cosmos Runs dbt* by @lzdanski in #2453 * Fix miscellaneous documentation links by @lzdanski in #2454 * Add Mermaid diagrams and execution mode diagrams by @lzdanski and @tatiana in #2459 * Add documentation for memory optimization options by @pankajastro in #2340 * Fix typo in watcher execution mode docs by @evanvolgas in #2485 * Fix minor documentation issues by @evanvolgas in #2489 * Add troubleshooting note for dbt debug logs in ExecutionMode.WATCHER by @tatiana in #2491 * docs: unify RST header styles across documentation by @jigangz in #2473 * docs: fix env var for rich logging by @vricciardulli in #2514 * docs: update dbt project path example for Airflow 3 Astro compatibility by @yeoreums in #2512 * Document missing Cosmos Airflow config settings in cosmos-conf.rst by @tatiana in #2515 * Split security-privacy policy doc and add dependency cooldown by @pankajkoti in #2519 * Add performance optimization and troubleshooting docs by @pankajkoti in #2521 * Update copyright year to 2026 by @tayloramurphy in #2527 * docs: Updating "Project Policies" to "Policies" in menu bar by @jroachgolf84 in #2526 Others * Fix tests after removing support for Airflow versions earlier than 2.9 by @tatiana in #2321 * Enable listener tests for Airflow 3.1 by @pankajastro in #2348 * Accept ``int`` or ``float`` for ``cosmos_debug_max_memory_mb`` in integration tests by @pankajkoti in #2352 * Update ``CODEOWNERS`` to prioritize ``oss-integrations`` by @tatiana in #2359 * Fix automatic reviewer assignment in GitHub by @tatiana and @phanikumv in #2360 * Improve PyPI tagging by @tatiana in #2363 * Add integration tests for dbt Fusion and ``ExecutionMode.WATCHER`` by @tatiana in #2373 * Fix Zizmor check by @tatiana in #2376 * Remove ``methodtools`` dependency by @tatiana in #2378 * Improve comments on #2389 by @evanvolgas in #2394 * Refactor ``load_from_dbt_manifest`` to reduce code complexity by @pankajkoti in #2399 * Refactor ``_handle_no_precursors_or_descendants`` to reduce complexity by @pankajkoti in #2400 * Improve issue templates by @tatiana in #2401 * Avoid running tests when only docs change by @tatiana in #2402 * Add ``no-reload`` target for serving docs locally by @pankajkoti in #2405 * Fix test hash checks on macOS by @tatiana in #2406 * Attempt deterministic dbt project copy in test fixtures by @pankajkoti in #2409 * Pin ``virtualenv <21`` due to hatch incompatibility in CI by @pankajkoti in #2410 * Revert virtualenv pin for hatch installation in CI by @pankajkoti in #2426 * Add version comments for commit SHA pinned GitHub Actions by @pankajkoti in #2436 * Fix ``hatch run docs:build`` issues by @tatiana in #2437 * Minor code improvements by @dnskr in #2446 * Pre-commit autoupdate by @pre-commit-ci in #2367, #2396, #2422, #2451, #2468, #2495, and #2516 * Add file to support Claude understanding the Cosmos repository by @tatiana in #2458 * Dependency updates by @dependabot in #2368, #2425, #2435, #2465, #2475, #2504, #2518, and #2528 * Isolate Scarf telemetry integration test into its own CI job by @pankajkoti and @claude in #2477 * ci: upgrade Airflow version to 3.1 in MyPy type-check job by @yeoreums in #2506 * Add commit message guidelines to CLAUDE.md by @pankajkoti in #2509 * Extend skipping tests in CI for more non-code file changes by @pankajkoti in #2510 * Add Dependabot pre-commit support with 7-day cooldown by @pankajkoti in #2517 * Enforce zero warnings policy for documentation by @dnskr in #2513 Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com> Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Fixes DAG rendering when using manifest load mode with installed dbt packages (e.g. Elementary, dbt_artifacts): package models are now included and have correct paths (this was not working previously).
Additionally, you can now select/exclude by package which was not working previously.
Also, fixes selecting/excluding models by folder_name which was not working previously.
Changes
Manifest mode – package nodes
file_pathfor package nodes: In the manifest, package nodes useoriginal_file_pathrelative to the package root (e.g.models/edr/...). Cosmos now resolves them asproject_path/dbt_packages/<package_name>/<original_file_path>so package models appear in the DAG and path-based logic works.metadata.project_nameto treat root project nodes asproject_path/original_file_path. Whenmetadata.project_nameis missing (older/test manifests), all nodes are treated as root for backward compatibility.Selectors
package:package_name: You can include or exclude all nodes from a package (e.g.exclude=["package:dbt_artifacts"]orselect=["package:elementary"]), including when using manifest load mode.exclude=["dbt_artifacts"]) is resolved like in dbt: by package name or node name or a folder name containing models.Robustness and errors
Empty
dbt_project.yml:get_dbt_packages_subpath()no longer assumesyaml.safe_load()returns a dict; handlesNone(e.g. empty file) so manifest loading does not raise when the project dir has an empty dbt_project.yml.Manifest loading: Handles json.load() returning None and uses metadata only when it is a dict when reading project_name.
closes: #1528
closes: https://github.com/astronomer/oss-integrations-private/issues/318