Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Constraints] Add parse-time validation that ensures only a single primary_key constraint is defined #9581

Closed
1 task done
MichelleArk opened this issue Feb 15, 2024 · 3 comments · Fixed by #9700
Closed
1 task done
Assignees
Labels
Impact: CIP model_contracts user docs [docs.getdbt.com] Needs better documentation
Milestone

Comments

@MichelleArk
Copy link
Contributor

MichelleArk commented Feb 15, 2024

Housekeeping

  • I am a maintainer of dbt-core

Short description

Currently, no parse-time error is raised when dbt sees a contract that defines a primary_key at both the model-level and the column-level. Additionally, no parse-time error is raised when multiple primary_keys are defined at the column-level.

In both cases, this will lead to a database error across many (probably all) data warehouses. Validated here: dbt-labs/jaffle-shop-classic#106

Additionally, our documented guidance in dbt is to define tests/constraints that work across multiple columns at the model-level. If a warehouse supports defining multiple primary keys at the column-level, it is really working by coincidence from the perspective of dbt.

Acceptance criteria

We raise an error at parse-time if:

  1. a primary_key constraint is defined at both the model-level and column-level
  2. a primary_key constraint is defined on multiple columns within a model

Suggested Tests

  1. Scenario where a primary key is defined across model and column level constraints
  2. scenario where a primary key is defined on multiple columns
  3. same as (1) and (2), but for versioned models

Impact to Other Teams

N/A

Will backports be required?

No.

Context

Additionally, I believe we only parse + serialize constraints if the contract is enforced currently. So if config.contract is False, we don't even store the constraints as provided. This was probably done to avoid raising errors related to warehouse-specific errors related to constraint prerequisites at parse time. However, we should probably still be parsing the constraints (storing constraints defined in project spec onto internal node + serializing them) and running any parse-time validation against them at least.

@will-sargent-dbtlabs
Copy link

will-sargent-dbtlabs commented Mar 29, 2024

@graciegoheen and @emmyoop
In attempting to migrate a customer to 1.8 for parsing mitigation, I'm getting this error that isn't happening in 1.7:

Parsing Error
  Primary key constraint error: (models/ref/exch_rate_cp.sql)
  Found 3 columns (['cmpny_id', 'asset_id', 'asset_ln_id']) with primary key constraints defined. Primary keys for multiple columns must be defined as a model level constraint.

Is this related to this PR?

@will-sargent-dbtlabs
Copy link

If so, this is actually causing dbt parse to fail completely on 1.8, so its now a breaking fix to migrate from 1.7 to 1.8 for them. Attaching debug logs..

795894_debug logs.log
759894 logs.log

@will-sargent-dbtlabs
Copy link

In 1.7 we just warn and keep going, but now it’s actually a fail.
I’m generally “for” the idea that we should force customers to fix syntax errors.
I’m also hesitant because our “promise” is that minor version upgrades won’t be breaking, and this definitely is. It doesn’t fail in 1.7 (although it also might not work, just a warn) but it fails the whole mess in latest..
That feels like the kind of thing like tests: -> data_tests that we are going to have to give some longer soft warn and runway on..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impact: CIP model_contracts user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants