Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GA of dbt Explorer #5488

Merged
merged 21 commits into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 13 additions & 16 deletions website/docs/docs/collaborate/column-level-lineage.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ title: "Column-level lineage"
description: "Use dbt Explorer's column-level lineage to gain insights about your data at a granular level."
---

# Column-level lineage <Lifecycle status='public preview' />

dbt Explorer now offers column-level lineage (CLL) for the resources in your dbt project. Analytics engineers can quickly and easily gain insight into the provenance of their data products at a more granular level. For each column in a resource (model, source, or snapshot) in a dbt project, Explorer provides end-to-end lineage for the data in that column given how it's used.

CLL is available to dbt Cloud Enterprise accounts that can use Explorer. It’s also available through the [Discovery API](/docs/dbt-cloud-apis/discovery-api).
runleonarun marked this conversation as resolved.
Show resolved Hide resolved
runleonarun marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -20,6 +18,19 @@ dbt Cloud updates the lineage in Explorer after each run that's executed in the

<LoomVideo id='3040bf2a2ade45eca7942a7aed6b730c' />
runleonarun marked this conversation as resolved.
Show resolved Hide resolved
runleonarun marked this conversation as resolved.
Show resolved Hide resolved

## Column evolution lens {#column-lens}

You can use the column evolution lineage lens to determine when a column is transformed vs. reused (passthrough or rename). The lens helps you distinguish when and how a column is actually changed as it flows through your dbt lineage, informing debugging workflows in particular.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-cel.png" width="40%" title="Example of the Column evolution lens"/>

### Inherited column descriptions

A reused column, labeled as *passthrough* or *rename*, inherits its description from source and upstream model columns. In other words, source and upstream model columns propagate their descriptions downstream whenever they are not transformed, meaning you don’t need to manually define the description. Passthrough and rename columns are clearly labeled and color coded.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-prop-inherit.png" width="40%" title="Example of propagate and inherit column descriptiions"/>


## Column-level lineage use cases {#use-cases}

Learn more about why and how you can use CLL in the following sections.
Expand Down Expand Up @@ -55,17 +66,3 @@ Possible error cases are:
- **Parsing error** &mdash; Error occurs when the SQL is ambiguous or too complex for parsing. An example of ambiguous parsing scenarios are _complex_ lateral joins.
- **Python error** &mdash; Error occurs when a Python model is used within the lineage. Due to the nature of Python models, it's not possible to parse and determine the lineage.
- **Unknown error** &mdash; Error occurs when the lineage can't be determined for an unknown reason. An example of this would be if a dbt best practice is not being followed, like using hardcoded table names instead of `ref` statements.

### Data platform support

CLL in dbt Cloud works with the following data platforms:
- Snowflake
- BigQuery
- Redshift
- Databricks (Unity Catalog)

The following adapters aren't currently supported by CLL in dbt Cloud. More of these platforms will be supported in the future.
- Hive metastore version of Databricks
- Apache Spark
- Starburst/Trino
- Microsoft Fabric
2 changes: 0 additions & 2 deletions website/docs/docs/collaborate/explore-multiple-projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ sidebar_label: "Explore multiple projects"
description: "Learn about project-level lineage in dbt Explorer and its uses."
---

# Explore multiple projects <Lifecycle status='public preview' />

You can also view all the different projects and public models in the account, where the public models are defined, and how they are used to gain a better understanding about your cross-project resources.

The resource-level lineage graph for a given project displays the cross-project relationships in the DAG. The different icons indicate whether you’re looking at an upstream producer project (parent) or a downstream consumer project (child).
Expand Down
43 changes: 28 additions & 15 deletions website/docs/docs/collaborate/explore-projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,21 @@ pagination_next: "docs/collaborate/model-performance"
pagination_prev: null
---

# Explore your dbt projects <Lifecycle status='public preview' />

With dbt Explorer, you can view your project's [resources](/docs/build/projects) (such as models, tests, and metrics) and their <Term id="data-lineage">lineage</Term> to gain a better understanding of its latest production state. Navigate and manage your projects within dbt Cloud to help you and other data developers, analysts, and consumers discover and leverage your dbt resources.

## Prerequisites

- You have a dbt Cloud account on the [Team or Enterprise plan](https://www.getdbt.com/pricing/).
- You have set up a [production deployment environment](/docs/deploy/deploy-environments#set-as-production-environment) for each project you want to explore.
- There has been at least one successful job run in the production deployment environment.
- You have set up a [production](/docs/deploy/deploy-environments#set-as-production-environment) or [staging](/docs/deploy/deploy-environments#create-a-staging-environment) deployment environment for each project you want to explore.
- There has been at least one successful job run in the deployment environment.
- You are on the dbt Explorer page. To do this, select **Explore** from the top navigation bar in dbt Cloud.


## Generate metadata

dbt Explorer uses the metadata provided by the [Discovery API](/docs/dbt-cloud-apis/discovery-api) to display the details about [the state of your project](/docs/dbt-cloud-apis/project-state). The metadata that's available depends on the [deployment environment](/docs/deploy/deploy-environments) you've designated as _production_ in your dbt Cloud project. dbt Explorer automatically retrieves the metadata updates after each job run in the production deployment environment so it always has the latest results for your project.
dbt Explorer uses the metadata provided by the [Discovery API](/docs/dbt-cloud-apis/discovery-api) to display the details about [the state of your project](/docs/dbt-cloud-apis/project-state). The metadata that's available depends on the [deployment environment](/docs/deploy/deploy-environments) you've designated as _production_ or _staging_ in your dbt Cloud project. dbt Explorer automatically retrieves the metadata updates after each job run in the production or staging deployment environment so it always has the latest results for your project.

To view a resource and its metadata, you must define the resource in your project and run a job in the production environment. The resulting metadata depends on the [commands](/docs/deploy/job-commands) executed by the jobs.
To view a resource and its metadata, you must define the resource in your project and run a job in the production or staging environment. The resulting metadata depends on the [commands](/docs/deploy/job-commands) executed by the jobs.

| To view in Explorer | You must successfully run |
|---------------------|---------------------------|
Expand Down Expand Up @@ -83,28 +81,36 @@ Example of exploring the `order_items` model in the project's lineage graph:

## Lenses

The **Lenses** feature is available from your [project's lineage graph](#project-lineage) (lower right corner). Lenses are like map layers for your DAG. Lenses make it easier to understand your project’s contextual metadata at scale, especially to distinguish a particular model or a subset of models.
The **Lenses** feature is available from your [project's lineage graph](#project-lineage) (lower right corner). Lenses are like map layers for your DAG. Lenses make it easier to understand your project’s contextual metadata at scale, especially to distinguish a particular model or a subset of models.

When you apply a lens, tags become visible on the nodes in the lineage graph, indicating the layer value along with coloration based on that value. If you're significantly zoomed out, only the tags and their colors are visible in the graph.

Lenses are helpful to analyze a subset of the DAG if you're zoomed in, or to find models/issues from a larger vantage point.

<expandable alt_header="List of available lenses">

- **Default** (resource type)
- **Materialization Type** (for example, identifying incremental model dependencies)
- **Lastest Status** (for example, diagnosing a failed DAG region)
- **Model Layer** (for example, discovering marts models to analyze)
A resource in your project is characterized by resource type, materialization type, or model layer, as well as its latest run or latest test status. Lenses are available for the following metadata:

- **Relationship**: Organizes resources by resource type, such as models, tests, seeds, and [more](/reference/node-selection/syntax). Resource type uses the `resource_type` selector.
- **Materialization Type**: Identifies the strategy for building the dbt models in your data platform.
- **Latest Status**: The status from the latest execution of the resource in the current environment. For example, diagnosing a failed DAG region.
- **Model Layer**: The modeling layer that the model belongs to according to [best practices guide](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview#guide-structure-overview). For example, discovering marts models to analyze.
- **Marts** &mdash; A model with the prefix `fct_` or `dim_` or a model that lives in the `/marts/` subdirectory.
- **Intermediate** &mdash; A model with the prefix `int_`. Or, a model that lives in the `/int/` or `/intermediate/` subdirectory.
- **Staging** &mdash; A model with the prefix `stg_`. Or, a model that lives in the `/staging/` subdirectory.
- **Test Status**: The status from the latest execution of the tests that ran again this resource.
runleonarun marked this conversation as resolved.
Show resolved Hide resolved

</expandable>

### Example of lenses

Example of applying the **Materialization Type** _lens_ with the lineage graph significantly zoomed out:
Example of applying the **Materialization Type** _lens_ with the lineage graph zoomed out. In this view, each model name has a color according to the materialization type legend at the bottom, which specifies the materialization type. This color-coding helps to quickly identify the materialization types of different models.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-materialization-type.jpg" width="100%" title="Example of the Materialization type lens" />

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-materialization-type-lense.png" width="100%" title="Example of the Materialization type lens" />
Example of applying the **Tests Status** _lens_, where each model name displays the tests status according to the legend at the bottom, which specifies the test status.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-test-status.jpg" width="100%" title="Example of the Test Status lens" />

## Keyword search {#search-resources}

Expand Down Expand Up @@ -200,7 +206,10 @@ In the upper right corner of the resource details page, you can:

<expandable alt_header="What details are available for a test?">

- **Status bar** (below the page title) &mdash; Information on the last time the test ran, whether the test passed, test name, test target, and column name.
- **Status bar** (below the page title) &mdash; Information on the last time the test ran, whether the test passed, test name, test target, and column name. Defaults to all if not specified.
- **Test Type** (next to the Status bar) &mdash; Information on the different test types available: Unit test or Data test. Defaults to all if not specified.

When you select a test, the following details are available:
- **General** tab includes:
- **Lineage** graph &mdash; The test’s lineage graph that you can interact with. The graph includes one upstream node and one downstream node from the test resource. Click the Expand icon in the graph's upper right corner to view the test in full lineage graph mode.
- **Description** section &mdash; A description of the test.
Expand All @@ -209,6 +218,10 @@ In the upper right corner of the resource details page, you can:
- **Relationships** section &mdash; The nodes the test **Depends On**.
- **Code** tab &mdash; The source code and compiled code for the test.

Example of the Tests view:

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-test-type.jpg" width="100%" title="Example of Test Type details" />

</expandable>

<expandable alt_header="What details are available for each source table within a source collection?">
Expand All @@ -230,7 +243,7 @@ Example of the details view for the model `supplies`:

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-model-details.png" width="100%" title="Example of resource details" />

## Staging environment <Lifecycle status='beta' />
## Staging environment
runleonarun marked this conversation as resolved.
Show resolved Hide resolved

dbt Explorer supports views for [Staging deployment environments](/docs/deploy/deploy-environments#staging-environment), in addition to the Production environment. This gives you a unique view into your pre-production data workflows, with the same tools available in production, while providing an extra layer of scrutiny. Once the Staging environment is configured and has a successful job run, it will be visible on the dbt Explorer landing page.
runleonarun marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
2 changes: 0 additions & 2 deletions website/docs/docs/collaborate/model-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ sidebar_label: "Model performance"
description: "Learn about the performance of your models so you can make improvements to save time and money."
---

# Model performance <Lifecycle status='public preview' />

dbt Explorer provides metadata on dbt Cloud runs for in-depth model performance and quality analysis. This feature assists in reducing infrastructure costs and saving time for data teams by highlighting where to fine-tune projects and deployments &mdash; such as model refactoring or job configuration adjustments.

<LoomVideo id='98f33b3b7a374df0b7c04747eae6ef44' />
Expand Down
2 changes: 0 additions & 2 deletions website/docs/docs/collaborate/project-recommendations.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ title: "Project recommendations"
sidebar_label: "Project recommendations"
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project."
---

# Project recommendations <Lifecycle status='public preview' />

dbt Explorer provides recommendations about your project from the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) using metadata from the Discovery API.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading