Skip to content

Support cross-referencing models across dbt projects using dbt-loom#2271

Merged
tatiana merged 16 commits into
mainfrom
dbt-loom-poc-projects
Jan 29, 2026
Merged

Support cross-referencing models across dbt projects using dbt-loom#2271
tatiana merged 16 commits into
mainfrom
dbt-loom-poc-projects

Conversation

@pankajkoti
Copy link
Copy Markdown
Contributor

@pankajkoti pankajkoti commented Jan 15, 2026

This PR adds support for dbt-loom, enabling Cosmos to work with multi-project dbt architectures where downstream projects reference models from upstream projects.

When using dbt-loom, downstream projects reference upstream models via {{ ref('upstream_project', 'model_name') }}. dbt-loom injects these external model references into the downstream project's namespace by reading the upstream project's manifest.json

Cosmos now automatically detects and skips external nodes (those without original_file_path) during DAG generation, while still creating tasks for the project's own models. This works for both:
LoadMode.DBT_LS - parsing via dbt ls
LoadMode.DBT_MANIFEST - parsing via manifest file

The PR adds the example Projects (in dev/dags/dbt/):
dbt_loom_upstream_platform/ - staging & intermediate models with seeds
dbt_loom_downstream_finance/ - finance fact tables referencing upstream models
dbt_loom_dags.py - combined DAG with chained task groups

The PR also adds a comprehensive guide for multi-project setups in docs/configuration/multi-project.rst

closes: #2107


Co-authored-by: Tatiana Al-Chueyr tatiana.alchueyr@gmail.com

@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 15, 2026

Deploy Preview for astronomer-cosmos canceled.

Name Link
🔨 Latest commit 328937a
🔍 Latest deploy log https://app.netlify.com/projects/astronomer-cosmos/deploys/697b3553086b3b000825e8ac

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.99%. Comparing base (46d57ee) to head (328937a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2271   +/-   ##
=======================================
  Coverage   97.99%   97.99%           
=======================================
  Files         100      100           
  Lines        6431     6440    +9     
=======================================
+ Hits         6302     6311    +9     
  Misses        129      129           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pankajkoti pankajkoti force-pushed the dbt-loom-poc-projects branch from b31e48e to a65c838 Compare January 21, 2026 15:20
@pankajkoti pankajkoti marked this pull request as ready for review January 21, 2026 15:25
Copilot AI review requested due to automatic review settings January 21, 2026 15:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for multi-project dbt setups using dbt-loom, enabling Cosmos to handle cross-project references where downstream dbt projects reference models from upstream projects.

Changes:

  • Cosmos now skips external nodes (those without file paths) injected by dbt-loom during DAG generation
  • Added comprehensive documentation for multi-project setups with configuration examples
  • Included test coverage for the external node skipping behavior

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cosmos/dbt/graph.py Added logic to skip nodes without file paths in both manifest and dbt ls parsing methods
tests/dbt/test_graph.py Added test to verify external nodes from dbt-loom are properly skipped
docs/configuration/multi-project.rst New comprehensive documentation explaining multi-project setups with dbt-loom
docs/configuration/index.rst Added multi-project documentation to the configuration index
pyproject.toml Added dbt-loom as an optional dependency
scripts/test/pre-install-airflow.sh Added dbt-loom installation to test setup
dev/dags/dbt_loom_dags.py Example DAG demonstrating multi-project setup
dev/dags/dbt/dbt_loom_upstream_platform/* Example upstream dbt project files
dev/dags/dbt/dbt_loom_downstream_finance/* Example downstream dbt project files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dev/dags/dbt/dbt_loom_downstream_finance/dbt_loom.config.yml Outdated
Comment thread cosmos/dbt/graph.py Outdated
Comment thread dev/dags/dbt/cross_project/downstream/models/fct_revenue.sql Outdated
@pankajkoti pankajkoti changed the title Support multiple dbt projects using dbt-loom Support cross-referencing models across dbt projects using dbt-loom Jan 22, 2026
Comment thread docs/configuration/multi-project.rst
@tatiana
Copy link
Copy Markdown
Collaborator

tatiana commented Jan 23, 2026

Hi @pankajkoti ! This work is very exciting - it's really cool to be able to have a dbt Mesh feature in Cosmos without the need to lock into a proprietary platform.

I'd love it if you could address the following points, as we discussed yesterday:

  1. Inside the dev/dags/dbt folder, put both new dbt projects into the folder cross-project. There, I suggest one of the projects is named upstream (or parent) and the other is named downstream (or child), so it is very clear why we have those projects
  2. Please, could you also confirm that Airflow Assets/Datasets emitted via Cosmos by the upstream DAG could trigger the downstream DAG?
  3. Please, could you confirm if the syntax is exactly the same as dbt Mesh, and if there are any other lacking feaeture parities
  4. Were you able to test the cross-project reference, considering users were using LoadMode.DBT_MANIFEST both on the upstream and downstream projects?
  5. Would it be worth highlighting in our example or in the docs the fact that both projects can be run with different profiles?
  6. Is there a way of running nodes of different projects in the same Cosmos DbtTaskGroup?

Add two minimal dbt projects to demonstrate dbt Loom behavior:
- platform_project: upstream with 2 public models, 1 macro, 1 source
- finance_project: downstream with cross-project refs and dbt Loom config

These projects verify that dbt Loom only injects models (not macros or sources) cross-project.
Add two example DAGs to test Cosmos compatibility with dbt Loom:
- dbt_loom_platform_dag.py: Upstream project with public models
- dbt_loom_finance_dag.py: Downstream project using dbt Loom cross-project refs
dbt Projects:
- Rename platform_project -> dbt_loom_upstream_platform
- Rename finance_project -> dbt_loom_downstream_finance
- Add comprehensive seed data (customers, orders, order_items, products)
- Add staging models with public access for cross-project refs
- Add intermediate models (int_orders_enriched, int_customer_orders)
- Add finance models (fct_revenue, fct_customer_revenue, dim_payment_methods)
- Update profiles to use PostgreSQL
- Consolidate into single DAG with chained task groups

Documentation:
- Add docs/configuration/multi-project.rst with comprehensive guide
- Cover cross-project model references using dbt-loom
- Document patterns for cross-project sources and macros
- Include Cosmos DAG configuration examples
- Add troubleshooting and best practices sections
Remove curly braces from ref() example in docstring to prevent
Airflow from trying to render it as a Jinja template.
@pankajkoti
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review @tatiana . I have addressed the feedback.

  1. Inside the dev/dags/dbt folder, put both new dbt projects into the folder cross-project. There, I suggest one of the projects is named upstream (or parent) and the other is named downstream (or child), so it is very clear why we have those projects

Done so, thanks!

  1. Please, could you also confirm that Airflow Assets/Datasets emitted via Cosmos by the upstream DAG could trigger the downstream DAG?

Yes, verified this, works smoothly. Added snapshots in the docs as we discussed earlier today.

  1. Please, could you confirm if the syntax is exactly the same as dbt Mesh, and if there are any other lacking feaeture parities

Yes, confiremd that the syntax is exactly the same as dbt Mesh.

  1. Were you able to test the cross-project reference, considering users were using LoadMode.DBT_MANIFEST both on the upstream and downstream projects?

Yes, I tested this with both projects using LoadMode.DBT_MANIFEST and it works well. Also have added an example DAG for the same to be run in our CI.

  1. Would it be worth highlighting in our example or in the docs the fact that both projects can be run with different profiles?

Yes. highlighted this in our docs now

  1. Is there a way of running nodes of different projects in the same Cosmos DbtTaskGroup?

No, this is not currently supported. Each DbtTaskGroup (or DbtDag) is configured with a single ProjectConfig that points to one dbt project.
When Cosmos parses a dbt-loom project, it encounters external nodes injected from upstream projects. However, Cosmos intentionally skips these external nodes during DAG generation -- they don't have local file paths and are meant to be run by their own project. Cosmos only creates Airflow tasks for the models that belong to the current project. Supporting multiple projects in a single DbtTaskGroup would require significant changes, I believe.

Requesting re-review, please!

@pankajkoti pankajkoti requested a review from tatiana January 27, 2026 17:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 41 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py Outdated
Comment thread dev/dags/dbt/cross_project/downstream/dbt_loom.config.yml
Comment thread docs/configuration/multi-project.rst Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread cosmos/dbt/graph.py
Comment thread pyproject.toml Outdated
Comment thread cosmos/dbt/graph.py Outdated
Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pankajkoti, this looks great, thank you very much!

I have one last concern, discussed in the thread https://github.com/astronomer/astronomer-cosmos/pull/2271/changes#r2736991273.

Once you address this and the checks are passing, please feel free to merge the PR.

Comment thread pyproject.toml Outdated
@tatiana tatiana merged commit 0ce9954 into main Jan 29, 2026
90 checks passed
@tatiana tatiana deleted the dbt-loom-poc-projects branch January 29, 2026 11:09
@pankajastro pankajastro mentioned this pull request Jan 29, 2026
tatiana added a commit that referenced this pull request Jan 30, 2026
Features

* Support cross-referencing models across dbt projects using dbt-loom by
@pankajkoti in #2271
* Support use of YAML selectors when using ``LoadMode.DBT_MANIFEST`` by
@YourRoyalLinus in #2261
* Introduce ``ExecutionMode.WATCHER_KUBERNETES`` to use the watcher with
``KubernetesPodOperator`` by @tatiana in #2207
* Add support for StarRocks profile mapping by @kurkim0661 in #2256
* Allow pushing URIs as XComs for Cosmos tasks by @corsettigyg in #2275
* Support defining custom callbacks alongside the ``WATCHER_KUBERNETES``
callback by @johnhoran in #2307

Enhancements

* Refactor: remove duplicate ``_construct_dest_file_path`` by @jx2lee in
#2077
* Leverage Airflow ``::group::`` to group logs associated with DAG
parsing by @tatiana in #2235
* Refactor ``DbtConsumerWatcherSensor`` for reusability by @tatiana in
#2245
* Restore plain text output when using ``ExecutionMode.WATCHER`` by
@tiovader in #2241

Bug Fixes

* Fix running empty models or ephemeral nodes in
``ExecutionMode.WATCHER`` by @tatiana in #2279
* Improve watcher producer task priority in scheduling and the UI by
@tatiana in #2237
* Fix typos and formatting issues in documentation by @pankajkoti in
#2259
* Allow watcher producer retries without erroring by @tatiana in #2283
* Fix ``TestBehavior.AFTER_ALL`` is missing project_name information
when loading project using manifest file by @tuantran0910 in #2242
* Fix duplicate log lines in watcher subprocess execution and format
timestamps by @pankajkoti in #2301

Docs

* Add Watcher Kubernetes documentation by @tatiana in #2303
* Document newly added telemetry metrics in the privacy notice by
@pankajkoti in #2249
* Add compatibility policy document by @pankajastro in #2251
* Improve watcher documentation related to dbt threads by @tatiana in
#2273
* Fix link in watcher execution mode documentation by @jedcunningham in
#2277
* Update Apache Airflow minimum compatibility policy by @tatiana in
#2285
* Clarify Cosmos runtime support until "End of Basic Support" by
@jedcunningham in #2286
* Update watcher docs by @tatiana in #2298
* Update watcher kubernetes documentation by @tatiana in #2306

Others

* Add Airflow 3 DAG versioning tests for Cosmos by @michal-mrazek in
#2177
* Add dbt Core 1.11 to the test matrix by @tatiana in #2230
* Add integration tests using InvocationMode.SUBPROCESS and validate
output by @tatiana in #2287
* Fix main branch failing tests by @tatiana in #2296
* Update pre-commit hooks to the latest versions by @jedcunningham in
#2289
* Pre-commit autoupdates by @pre-commit in #2222, #2264, #2274 and #2290
* Dependabot updates by @dependabot in #2218, #2219, #2220, #2280 and
#2284
* Add Scarf metrics to understand Cosmos feature usage patterns
- Add telemetry tracking for dbt docs plugin usage by @pankajkoti in
#2240
- Add DAG run telemetry metrics for load mode, invocation, and
render_config parameters by @pankajkoti in #2223
  - Collect profile metrics for DAG runs by @pankajastro in #2228
- Compress telemetry metadata to reduce serialized DAG size by
@pankajkoti in #2252
- Skip storing telemetry metadata when emission is disabled by
@pankajkoti in #2278
- Hide telemetry metadata parameters from the Airflow trigger UI by
@pankajkoti in #2247

closes:
astronomer/oss-integrations-private#317

---------

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
@tatiana tatiana added this to the Cosmos 1.13.0 milestone Feb 19, 2026
award1230 pushed a commit to award1230/astronomer-cosmos that referenced this pull request Feb 20, 2026
…ternal nodes

When using the `+` (precursor) graph selector with dbt-loom cross-project
references, `select_node_precursors` crashes with a `KeyError` because
external nodes (injected by dbt-loom) are filtered out during manifest
loading but local nodes still reference them in `depends_on`.

The dbt-loom support added in astronomer#2271 correctly skips external nodes (those
without `original_file_path`) during manifest loading. However, when the
`+` graph operator traverses upstream dependencies, it encounters
`depends_on` entries pointing to these filtered-out external nodes and
raises a `KeyError`.

This fix adds bounds checks in two locations:
- `GraphSelector.select_node_precursors`: skip node IDs not present in
  the nodes dict during upstream traversal
- `NodeSelector.select_nodes_ids_by_intersection`: skip external node IDs
  that were collected during graph traversal but are not in the nodes dict

This allows the graph traversal to gracefully stop at project boundaries,
which is the correct behavior for cross-project setups where external
dependencies are managed by their own DAGs/task groups.

Closes #<TBD>

Co-authored-by: Cursor <cursoragent@cursor.com>
tatiana pushed a commit that referenced this pull request Feb 22, 2026
#2389)

Fixes a `KeyError` when using the `+` (precursor) graph selector on a
project that uses dbt-loom for cross-project references.

cc @pankajkoti @tatiana — This is a follow-up to your dbt-loom support
in #2271. The external node skipping works great for basic rendering,
but we hit a `KeyError` when combining it with the `+` graph selector.
The `+` operator triggers `select_node_precursors` which traverses
`depends_on` entries — and those can point to external nodes that were
already filtered out during manifest loading. This code path wasn't
exercised by the tests in #2271 since the example DAGs don't use graph
selectors.

## Problem

The dbt-loom support added in #2271 correctly skips external nodes
(those without `original_file_path`) during manifest loading in
`load_from_dbt_manifest`. However, local nodes still have `depends_on`
entries pointing to these filtered-out external nodes.

When the `+` graph operator traverses upstream dependencies via
`select_node_precursors`, it does `nodes[node_id]` on these external
node IDs and raises a `KeyError`:

    File "cosmos/dbt/selector.py", line 172, in select_node_precursors
        new_generation.update(set(nodes[node_id].depends_on))
                                  ~~~~~^^^^^^^^^
    KeyError: 'model.upstream_project.external_model'

**Reproduction:** Use `select: ["+downstream_model"]` in `RenderConfig`
with `LoadMode.DBT_MANIFEST` on a project that uses dbt-loom with
cross-project `{{ ref('upstream_project', 'model_name') }}` references.

## Fix

Adds bounds checks in two locations in `cosmos/dbt/selector.py`:

1. **`GraphSelector.select_node_precursors`** (line 172): Skip node IDs
not present in the `nodes` dict during upstream traversal
2. **`NodeSelector.select_nodes_ids_by_intersection`** (line 552): Skip
external node IDs that were collected during graph traversal but don't
exist in the `nodes` dict

This allows the `+` traversal to gracefully stop at project boundaries —
the correct behavior for cross-project setups where external
dependencies are managed by their own DAGs/task groups. This is
consistent with how `select_node_descendants` already handles missing
parents via `defaultdict(set)`.

## Test plan

- [x] Added `test_select_nodes_by_precursors_with_external_dependency` —
creates a graph where a local node's `depends_on` includes an external
node ID not in the `nodes` dict, verifies `+` selector returns local
nodes without `KeyError`
- [x] All 166 existing selector tests pass
- [x] All existing dbt-loom tests in `test_graph.py` pass

Co-authored-by: Alex Ward <award@Mac.lan>
Co-authored-by: Cursor <cursoragent@cursor.com>
tatiana pushed a commit that referenced this pull request Feb 23, 2026
#2389)

Fixes a `KeyError` when using the `+` (precursor) graph selector on a
project that uses dbt-loom for cross-project references.

cc @pankajkoti @tatiana — This is a follow-up to your dbt-loom support
in #2271. The external node skipping works great for basic rendering,
but we hit a `KeyError` when combining it with the `+` graph selector.
The `+` operator triggers `select_node_precursors` which traverses
`depends_on` entries — and those can point to external nodes that were
already filtered out during manifest loading. This code path wasn't
exercised by the tests in #2271 since the example DAGs don't use graph
selectors.

## Problem

The dbt-loom support added in #2271 correctly skips external nodes
(those without `original_file_path`) during manifest loading in
`load_from_dbt_manifest`. However, local nodes still have `depends_on`
entries pointing to these filtered-out external nodes.

When the `+` graph operator traverses upstream dependencies via
`select_node_precursors`, it does `nodes[node_id]` on these external
node IDs and raises a `KeyError`:

    File "cosmos/dbt/selector.py", line 172, in select_node_precursors
        new_generation.update(set(nodes[node_id].depends_on))
                                  ~~~~~^^^^^^^^^
    KeyError: 'model.upstream_project.external_model'

**Reproduction:** Use `select: ["+downstream_model"]` in `RenderConfig`
with `LoadMode.DBT_MANIFEST` on a project that uses dbt-loom with
cross-project `{{ ref('upstream_project', 'model_name') }}` references.

## Fix

Adds bounds checks in two locations in `cosmos/dbt/selector.py`:

1. **`GraphSelector.select_node_precursors`** (line 172): Skip node IDs
not present in the `nodes` dict during upstream traversal
2. **`NodeSelector.select_nodes_ids_by_intersection`** (line 552): Skip
external node IDs that were collected during graph traversal but don't
exist in the `nodes` dict

This allows the `+` traversal to gracefully stop at project boundaries —
the correct behavior for cross-project setups where external
dependencies are managed by their own DAGs/task groups. This is
consistent with how `select_node_descendants` already handles missing
parents via `defaultdict(set)`.

## Test plan

- [x] Added `test_select_nodes_by_precursors_with_external_dependency` —
creates a graph where a local node's `depends_on` includes an external
node ID not in the `nodes` dict, verifies `+` selector returns local
nodes without `KeyError`
- [x] All 166 existing selector tests pass
- [x] All existing dbt-loom tests in `test_graph.py` pass

Co-authored-by: Alex Ward <award@Mac.lan>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 4d86173)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] PoC with dbt Loom to accomplish dbt Mesh cross-project resolution

3 participants