-
Notifications
You must be signed in to change notification settings - Fork 294
Support cross-referencing models across dbt projects using dbt-loom #2271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
9d49d4d
Add dbt Loom PoC projects for testing cross-project references
pankajkoti 4ddddc9
Add Cosmos DAGs for dbt Loom PoC testing
pankajkoti f489b4f
Fix for cross-ref model
pankajkoti a1134b9
Revamp dbt-loom POC projects and add multi-project documentation
pankajkoti 1e01de1
Fix example dag
pankajkoti 26e6608
Fix Jinja template error in DAG docstring
pankajkoti c86edab
Fix ddbt loom upstream project path for GitHub CI run
pankajkoti b6435d2
Fix dbt-loom node skipping to preserve existing test behavior
pankajkoti 3d6da28
Add dbt-loom to pre-install script
pankajkoti 518d9e4
Add test for missing coverage
pankajkoti 4e0c731
Rename projects
pankajkoti 7ec9445
Address docs feedback
pankajkoti 4969e16
Address Copilot's review comment
pankajkoti d843c5d
Update docs/configuration/multi-project.rst
pankajkoti 53edaf1
Add safety check for skipping upstream external models injected by db…
pankajkoti 328937a
Apply suggestion from @pankajkoti
pankajkoti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| """ | ||
| Example DAG for cross project reference demonstration - using 'dbt ls' Load Mode for both upstream and downstream dbt Projects. | ||
|
|
||
| This example demonstrates how Cosmos works with dbt-loom for cross-project references. | ||
|
|
||
| Architecture: | ||
| upstream → downstream | ||
| ├── stg_customers ├── fct_revenue | ||
| ├── stg_orders ├── fct_customer_revenue | ||
| ├── stg_order_items ├── dim_payment_methods | ||
| ├── stg_products └── rpt_revenue_summary | ||
| ├── int_orders_enriched | ||
| └── int_customer_orders | ||
|
|
||
| The downstream project uses dbt-loom to reference upstream models via: | ||
| ref('upstream', 'stg_customers') | ||
|
|
||
| Key Points: | ||
| 1. Upstream project must generate manifest.json first (via dbt parse/compile/ls) | ||
| 2. Downstream project must be able to query upstream tables (same DB, cross-DB, etc.) | ||
| 3. Cosmos correctly handles dbt-loom's external node references (skips them) | ||
|
|
||
| Database Setup (this example): | ||
| - Upstream models: 'platform' schema | ||
| - Downstream models: 'finance' schema | ||
| """ | ||
|
|
||
| import os | ||
| from datetime import datetime | ||
| from pathlib import Path | ||
|
|
||
| from airflow import DAG | ||
|
|
||
| from cosmos import DbtTaskGroup, ProfileConfig, ProjectConfig, RenderConfig | ||
| from cosmos.profiles import PostgresUserPasswordProfileMapping | ||
|
|
||
| DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt" | ||
| DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH)) | ||
|
|
||
| # Airflow connection ID for PostgreSQL | ||
| POSTGRES_CONN_ID = "example_conn" | ||
|
|
||
| # Project paths | ||
| DBT_UPSTREAM_PROJECT_PATH = DBT_ROOT_PATH / "cross_project" / "upstream" | ||
| DBT_DOWNSTREAM_PROJECT_PATH = DBT_ROOT_PATH / "cross_project" / "downstream" | ||
|
|
||
|
|
||
| # [START cross_project_dbt_ls_dag] | ||
| # ============================================================================= | ||
| # Combined DAG with Task Groups - Upstream runs first, then Downstream | ||
| # ============================================================================= | ||
|
|
||
| with DAG( | ||
| dag_id="cross_project_dbt_ls_dag", | ||
| start_date=datetime(2024, 1, 1), | ||
| schedule=None, | ||
| catchup=False, | ||
| default_args={"retries": 0}, | ||
| tags=["dbt-loom", "dbt ls"], | ||
| doc_md=__doc__, | ||
| ) as dag: | ||
|
|
||
| # ------------------------------------------------------------------------- | ||
| # Upstream Task Group - Core Data Platform (upstream) | ||
| # ------------------------------------------------------------------------- | ||
| # Contains foundational models (staging, intermediate) exposed as public | ||
| # models for the downstream project to reference via dbt-loom. | ||
|
|
||
| upstream_profile_config = ProfileConfig( | ||
| profile_name="upstream", | ||
| target_name="dev", | ||
| profile_mapping=PostgresUserPasswordProfileMapping( | ||
| conn_id=POSTGRES_CONN_ID, | ||
| profile_args={"schema": "platform", "threads": 4}, | ||
| ), | ||
| ) | ||
|
|
||
| upstream_task_group = DbtTaskGroup( | ||
| group_id="upstream", | ||
| project_config=ProjectConfig( | ||
| dbt_project_path=DBT_UPSTREAM_PROJECT_PATH, | ||
| ), | ||
| profile_config=upstream_profile_config, | ||
| render_config=RenderConfig( | ||
| dbt_deps=True, | ||
| ), | ||
| operator_args={ | ||
| "install_deps": True, | ||
| }, | ||
| ) | ||
|
|
||
| # ------------------------------------------------------------------------- | ||
| # Downstream Task Group - Finance Domain Models (downstream) | ||
| # ------------------------------------------------------------------------- | ||
| # Uses dbt-loom to reference public models from the upstream project. | ||
| # Cosmos skips external nodes (those without file paths) during parsing | ||
| # and only creates tasks for this project's own models. | ||
|
|
||
| downstream_profile_config = ProfileConfig( | ||
| profile_name="downstream", | ||
| target_name="dev", | ||
| profile_mapping=PostgresUserPasswordProfileMapping( | ||
| conn_id=POSTGRES_CONN_ID, | ||
| profile_args={"schema": "finance", "threads": 4}, | ||
| ), | ||
| ) | ||
|
|
||
| # Environment variables for dbt-loom to find the upstream manifest | ||
| # dbt_loom_env_vars = { | ||
| # "PLATFORM_MANIFEST_PATH": str(DBT_UPSTREAM_PROJECT_PATH / "target" / "manifest.json"), | ||
| # } | ||
|
|
||
| downstream_task_group = DbtTaskGroup( | ||
| group_id="downstream", | ||
| project_config=ProjectConfig( | ||
| dbt_project_path=DBT_DOWNSTREAM_PROJECT_PATH, | ||
| ), | ||
| profile_config=downstream_profile_config, | ||
| render_config=RenderConfig( | ||
| dbt_deps=True, | ||
| # For dbt loom environment variable configured upstream project's manifest | ||
| # env_vars=dbt_loom_env_vars, | ||
| ), | ||
| operator_args={ | ||
| "install_deps": True, | ||
| # For dbt loom environment variable configured upstream project's manifest | ||
| # "env": dbt_loom_env_vars, | ||
| }, | ||
| ) | ||
|
|
||
| # Chain: Upstream runs first, then Downstream | ||
| upstream_task_group >> downstream_task_group | ||
| # [END cross_project_dbt_ls_dag] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| """ | ||
| Example DAG for cross project reference demonstration - Using Manifest Load Mode for both upstream and downstream dbt Projects | ||
|
|
||
| This example demonstrates how Cosmos works with dbt-loom for cross-project references | ||
| using LoadMode.DBT_MANIFEST for faster DAG parsing (no dbt ls execution required). | ||
|
|
||
| Architecture: | ||
| upstream → downstream | ||
| ├── stg_customers ├── fct_revenue | ||
| ├── stg_orders ├── fct_customer_revenue | ||
| ├── stg_order_items ├── dim_payment_methods | ||
| ├── stg_products └── rpt_revenue_summary | ||
| ├── int_orders_enriched | ||
| └── int_customer_orders | ||
|
|
||
| Prerequisites: | ||
| 1. Generate manifest.json for both projects BEFORE deploying: | ||
| cd upstream && dbt compile | ||
| cd downstream && dbt compile | ||
|
|
||
| Or use CI/CD to generate and store manifests in S3/GCS. | ||
|
|
||
| 2. For remote manifests (S3/GCS/Azure), ensure the connection is configured. | ||
|
|
||
| Key Benefits of DBT_MANIFEST mode: | ||
| - No dbt installation required on scheduler | ||
| - Fastest parsing method | ||
| """ | ||
|
|
||
| import os | ||
| from datetime import datetime | ||
| from pathlib import Path | ||
|
|
||
| from airflow import DAG | ||
|
|
||
| from cosmos import DbtTaskGroup, ExecutionConfig, ProfileConfig, ProjectConfig, RenderConfig | ||
| from cosmos.constants import LoadMode | ||
| from cosmos.profiles import PostgresUserPasswordProfileMapping | ||
|
|
||
| DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt" | ||
| DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH)) | ||
|
|
||
| # Airflow connection ID for PostgreSQL | ||
| POSTGRES_CONN_ID = "example_conn" | ||
|
|
||
| # Project paths | ||
| DBT_UPSTREAM_PROJECT_PATH = DBT_ROOT_PATH / "cross_project" / "upstream" | ||
| DBT_DOWNSTREAM_PROJECT_PATH = DBT_ROOT_PATH / "cross_project" / "downstream" | ||
|
|
||
| # Manifest paths (local) | ||
| UPSTREAM_MANIFEST_PATH = DBT_UPSTREAM_PROJECT_PATH / "target" / "manifest.json" | ||
| DOWNSTREAM_MANIFEST_PATH = DBT_DOWNSTREAM_PROJECT_PATH / "target" / "manifest.json" | ||
|
|
||
| # ============================================================================= | ||
| # Alternative: Remote Manifest Paths (S3/GCS/Azure) - Uncomment to use | ||
| # ============================================================================= | ||
| # UPSTREAM_MANIFEST_PATH = "s3://your-bucket/dbt-manifests/upstream/manifest.json" | ||
| # DOWNSTREAM_MANIFEST_PATH = "s3://your-bucket/dbt-manifests/downstream/manifest.json" | ||
| # MANIFEST_CONN_ID = "aws_default" # or "google_cloud_default" for GCS | ||
|
|
||
| # [START cross_project_manifest_dag] | ||
| # ============================================================================= | ||
| # Combined DAG with Task Groups - Using DBT_MANIFEST Load Mode | ||
| # ============================================================================= | ||
|
|
||
| with DAG( | ||
| dag_id="cross_project_manifest_dag", | ||
| start_date=datetime(2024, 1, 1), | ||
| schedule=None, | ||
| catchup=False, | ||
| default_args={"retries": 0}, | ||
| tags=["dbt-loom", "manifest"], | ||
| doc_md=__doc__, | ||
| ) as dag: | ||
|
|
||
| # ------------------------------------------------------------------------- | ||
| # Upstream Task Group - Core Data Platform (upstream) | ||
| # ------------------------------------------------------------------------- | ||
|
|
||
| upstream_profile_config = ProfileConfig( | ||
| profile_name="upstream", | ||
| target_name="dev", | ||
| profile_mapping=PostgresUserPasswordProfileMapping( | ||
| conn_id=POSTGRES_CONN_ID, | ||
| profile_args={"schema": "platform", "threads": 4}, | ||
| ), | ||
| ) | ||
|
|
||
| upstream_task_group = DbtTaskGroup( | ||
| group_id="upstream", | ||
| project_config=ProjectConfig( | ||
| # Specify the manifest path for faster parsing | ||
| manifest_path=str(UPSTREAM_MANIFEST_PATH), | ||
| project_name="upstream", | ||
| # For remote manifests (S3/GCS/Azure), add: | ||
| # manifest_conn_id=MANIFEST_CONN_ID, | ||
| ), | ||
| profile_config=upstream_profile_config, | ||
| execution_config=ExecutionConfig( | ||
| dbt_project_path=DBT_UPSTREAM_PROJECT_PATH, dbt_executable_path="/usr/local/bin/dbt" | ||
| ), | ||
| render_config=RenderConfig( | ||
| # Use manifest-based parsing (no dbt ls required) | ||
| load_method=LoadMode.DBT_MANIFEST, | ||
| # Note: dbt_deps is not needed for manifest mode parsing | ||
| # but you may still want install_deps=True for task execution | ||
| ), | ||
| operator_args={ | ||
| "install_deps": True, | ||
| }, | ||
| ) | ||
|
|
||
| # ------------------------------------------------------------------------- | ||
| # Downstream Task Group - Finance Domain Models | ||
| # ------------------------------------------------------------------------- | ||
|
|
||
| downstream_profile_config = ProfileConfig( | ||
| profile_name="downstream", | ||
| target_name="dev", | ||
| profile_mapping=PostgresUserPasswordProfileMapping( | ||
| conn_id=POSTGRES_CONN_ID, | ||
| profile_args={"schema": "finance"}, | ||
| ), | ||
| ) | ||
|
|
||
| # Environment variables for dbt-loom to find the upstream manifest | ||
| # dbt_loom_env_vars = { | ||
| # "PLATFORM_MANIFEST_PATH": str(DBT_UPSTREAM_PROJECT_PATH / "target" / "manifest.json"), | ||
| # } | ||
|
|
||
| downstream_task_group = DbtTaskGroup( | ||
| group_id="downstream_finance", | ||
| project_config=ProjectConfig( | ||
| # Specify the manifest path for faster parsing | ||
| manifest_path=str(DOWNSTREAM_MANIFEST_PATH), | ||
| project_name="downstream", | ||
| # For remote manifests (S3/GCS/Azure), add: | ||
| # manifest_conn_id=MANIFEST_CONN_ID, | ||
| # For dbt loom environment variable configured upstream project's manifest | ||
| # env_vars=dbt_loom_env_vars, | ||
| ), | ||
| profile_config=downstream_profile_config, | ||
| execution_config=ExecutionConfig( | ||
| dbt_project_path=DBT_DOWNSTREAM_PROJECT_PATH, dbt_executable_path="/usr/local/bin/dbt" | ||
| ), | ||
| render_config=RenderConfig( | ||
| # Use manifest-based parsing (no dbt ls required) | ||
| load_method=LoadMode.DBT_MANIFEST, | ||
| ), | ||
| operator_args={ | ||
| "install_deps": True, | ||
| }, | ||
| ) | ||
|
|
||
| # Chain: Upstream runs first, then Downstream | ||
| upstream_task_group >> downstream_task_group | ||
| # [END cross_project_manifest_dag] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| manifests: | ||
| - name: upstream | ||
| type: file | ||
| config: | ||
| # Use environment variable for flexibility, with fallback to relative path | ||
| # path: '{{ env_var("PLATFORM_MANIFEST_PATH", "../upstream/target/manifest.json") }}' | ||
| # In production, set PLATFORM_MANIFEST_PATH to absolute path | ||
|
|
||
| # For GitHub Actions Integration Tests CI run, set the path to the manifest.json file | ||
| path: /home/runner/work/astronomer-cosmos/astronomer-cosmos/dev/dags/dbt/cross_project/upstream/target/manifest.json | ||
|
pankajkoti marked this conversation as resolved.
|
||
| # path: ../upstream/target/manifest.json | ||
|
|
||
|
|
||
| enable_telemetry: false | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.