Skip to content

Commit

Permalink
Clarifications for Staging envs, 1:1 projects for Mesh (#5453)
Browse files Browse the repository at this point in the history
resolves #5447

See linked issue for context

---------

Co-authored-by: Matt Shaver <[email protected]>
  • Loading branch information
jtcohen6 and matthewshaver authored May 14, 2024
1 parent 057ad4a commit dda5c3a
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 9 deletions.
8 changes: 8 additions & 0 deletions website/docs/best-practices/how-we-mesh/mesh-4-faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,14 @@ If you’re interested in beta access to “Staging” environments, let your db

</detailsToggle>

<detailsToggle alt_header="Does dbt Mesh work if projects are 'duplicated' (dev project <> prod project)?">

The short answer is "no." Cross-project references require that each project `name` be unique in your dbt Cloud account.

Historical limitations required customers to "duplicate" projects so that one actual dbt project (codebase) would map to more than one dbt Cloud project. To that end, we are working to remove the historical limitations that required customers to "duplicate" projects in dbt Cloud — Staging environments for data isolation (beta), environment-level permissions, and environment-level data warehouse connections (coming soon). Once those pieces are in place, it should no longer be necessary to define separate dbt Cloud projects to isolate data environments or permissions.

</detailsToggle>

## Compatibility with other features

<detailsToggle alt_header="How does the dbt Semantic Layer relate to and work with dbt Mesh?">
Expand Down
10 changes: 4 additions & 6 deletions website/docs/docs/collaborate/govern/project-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,10 @@ Refer to the [FAQs](#faqs) for more info.
## Prerequisites

In order to add project dependencies and resolve cross-project `ref`, you must:
- Use dbt v1.6 or higher for **both** the upstream ("producer") project and the downstream ("consumer") project.
- Define models in an upstream ("producer") project that are configured with [`access: public`](/reference/resource-configs/access). To apply the change, rerun a production job.
- Have a deployment environment in the upstream ("producer") project [that is set to be your production environment](/docs/deploy/deploy-environments#set-as-production-environment)
- Have a successful run of the upstream ("producer") project.
- Define and trigger a job before marking the environment as Staging. Read more about [Staging environments with downstream dependencies](/docs/collaborate/govern/project-dependencies#staging-with-downstream-dependencies).
- Have a multi-tenant or single-tenant [dbt Cloud Enterprise](https://www.getdbt.com/pricing) account (Azure ST is not supported but coming soon.)
- Use a supported version of dbt (v1.6, v1.7, or "Keep on latest version") for both the upstream ("producer") project and the downstream ("consumer") project.
- Define models in an upstream ("producer") project that are configured with [`access: public`](/reference/resource-configs/access). You need at least one successful job run after defining their `access`.
- Define a deployment environment in the upstream ("producer") project [that is set to be your Production environment](/docs/deploy/deploy-environments#set-as-production-environment), and ensure it has at least one successful job run in that environment.
- Each project `name` must be unique in your dbt Cloud account. For example, if you have a dbt project (codebase) for the `jaffle_marketing` team, you should not create separate projects for `Jaffle Marketing - Dev` and `Jaffle Marketing - Prod`. That isolation should instead be handled at the environment level. To that end, we are working on adding support for environment-level permissions and data warehouse connections; reach out to your dbt Labs account team for beta access in May/June 2024.

## Example

Expand Down
28 changes: 25 additions & 3 deletions website/docs/docs/deploy/deploy-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,16 @@ For Semantic Layer-eligible customers, the next section of environment settings
Currently in limited availability beta. Contact support or your account team if you're interested in beta access.
:::

Use a Staging environment to grant developers access to deployment workflows and tools while controlling access to production data. You can do this in a couple of ways, but the most straightforward is to configure Staging with a long-living branch (for example, `staging`) similar to but separate from the primary branch (for example, `main`).
Use a Staging environment to grant developers access to deployment workflows and tools while controlling access to production data. Staging environments enable you to achieve more granular control over permissions, data warehouse connections, and data isolation — within the purview of a single project in dbt Cloud.

### Git workflow

You can approach this in a couple of ways, but the most straightforward is configuring Staging with a long-living branch (for example, `staging`) similar to but separate from the primary branch (for example, `main`).

In this scenario, the workflows would ideally move upstream from the Development environment -> Staging environment -> Production environment with developer branches feeding into the `staging` branch, then ultimately merging into `main`. In many cases, the `main` and `staging` branches will be identical after a merge and remain until the next batch of changes from the `development` branches are ready to be elevated. We recommend setting branch protection rules on `staging` similar to `main`.

Some customers prefer to connect Development and Staging to their `main` branch and then cut release branches on a regular cadence (daily or weekly), which feeds into Production.

### Why use a staging environment

There are two primary motivations for using a Staging environment:
Expand All @@ -61,9 +67,25 @@ There are two primary motivations for using a Staging environment:
Provide developers with the ability to create, edit, and trigger ad hoc jobs in the Staging environment, while keeping the Production environment locked down.
:::

Let's say you have `Project B` downstream of `Project A` with cross-project refs configured in the models. When developers work in the IDE for `Project B`, cross-project refs will resolve to the Staging environment of `Project A`, rather than production. You'll get the same results with those refs when jobs are run in the Staging environment. Only the Production environment will reference the Production data, keeping the data and access isolated without needing separate projects.
**Conditional configuration of sources** enables you to point to "prod" or "non-prod" source data, depending on the environment you're running in. For example, this source will point to `<DATABASE>.sensitive_source.table_with_pii`, where `<DATABASE>` is dynamically resolved based on an environment variable.

<File name="models/sources.yml">

```yaml
sources:
- name: sensitive_source
database: "{{ env_var('SENSITIVE_SOURCE_DATABASE') }}"
tables:
- name: table_with_pii
```
</File>
There is exactly one source (`sensitive_source`), and all downstream dbt models select from it as `{{ source('sensitive_source', 'table_with_pii') }}`. The code in your project and the shape of the DAG remain consistent across environments. By setting it up in this way, rather than duplicating sources, you get some important benefits.

**Cross-project references in dbt Mesh:** Let's say you have `Project B` downstream of `Project A` with cross-project refs configured in the models. When developers work in the IDE for `Project B`, cross-project refs will resolve to the Staging environment of `Project A`, rather than production. You'll get the same results with those refs when jobs are run in the Staging environment. Only the Production environment will reference the Production data, keeping the data and access isolated without needing separate projects.

If `Project B` also has a Staging deployment, then references to unbuilt upstream models within `Project B` will resolve to that environment, using [deferral](/docs/cloud/about-cloud-develop-defer), rather than resolving to the models in Production. This saves developers time and warehouse spend, while preserving clear separation of environments.
**Faster development enabled by deferral:** If `Project B` also has a Staging deployment, then references to unbuilt upstream models within `Project B` will resolve to that environment, using [deferral](/docs/cloud/about-cloud-develop-defer), rather than resolving to the models in Production. This saves developers time and warehouse spend, while preserving clear separation of environments.

Finally, the Staging environment has its own view in [dbt Explorer](/docs/collaborate/explore-projects), giving you a full view of your prod and pre-prod data.

Expand Down

0 comments on commit dda5c3a

Please sign in to comment.