Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect breaking changes to column names and data types in state:modified check #7216

Merged
merged 27 commits into from
Mar 28, 2023

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Mar 23, 2023

resolves #6869

Description

Add "same_contract", "state:modified.contract" to defer_state checks, create new exception and tests.

Checklist

@gshank gshank requested a review from a team March 23, 2023 17:34
@gshank gshank requested review from a team as code owners March 23, 2023 17:34
@gshank gshank requested review from stu-k and QMalcolm March 23, 2023 17:34
@cla-bot cla-bot bot added the cla:yes label Mar 23, 2023
@gshank gshank marked this pull request as draft March 23, 2023 17:34
@gshank gshank requested review from MichelleArk and removed request for QMalcolm and stu-k March 23, 2023 17:34
@gshank
Copy link
Contributor Author

gshank commented Mar 23, 2023

Please look at the exception message to see what you want.

Comment on lines 225 to 226
"There is a breaking change in the model contract; "
"you may need to create a new version. See: https://docs.getdbt.com/docs/collaborate/publish/model-versions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message looks good to me 👍

Copy link
Contributor

@MichelleArk MichelleArk Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user is aware they're making a breaking change, and just needs some guidance regarding making a new version of the model - this error message looks good to me too.

That said, I am worried that this error message could be a little opaque for unintented breaking changes, especially as our set of changes are considered breaking changes becomes more nuanced. For example, a user updates a contracted model with constraints from table to view in an attempt to optimize some spend, which raises a breaking change error because the existing constraints can't be validated. Just seeing There is a breaking change in the model contract could be confusing, and perhaps the user would actually choose to just revert that change and make it at a later point (bundle it with some other breaking changes). In that scenario, I'd (as an imaginary user) love to see an error message that looks more like:

There is a breaking change in the model contract: 
 * Updated materialization from `table` to `view` for model with constraints
 * Updated config from contract: true to contract: false
 * Column a removed
 * Column c updated data_type from string to int
 * ...
You may need to create a new version. See: https://docs.getdbt.com/docs/collaborate/publish/model-versions

This may not be concern with our current definition of a breaking changes (any change to column name/data_type => breaking change), but given that we're planning to extend this to be more nuanced soon in #7065, I'd leave some design space for it if this scenario seems important to prioritize the UX of (cc @jtcohen6).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the change of a materialization is not a contract change, it's a config change. The ticket doesn't mention materialization -- do you want that to be part of the contract change too?

Creating that more user-friendly error message will probably be about twice as much work than everything else so far. If you want me to spend my time doing that, I can.

Copy link
Contributor

@MichelleArk MichelleArk Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Materializations are not part of this issue, but they will be part the work to extend detecting breaking changes in #7065 (which i still blocked by #7067). If we know we'll want more granular error messaging as part of that work, we could set up the foundations here or leave it to #7065.

I agree it's a good deal more work to get that though; we could break off improving the error messaging into a separate issue so it's refined and estimated appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a brief "reason" to the exception message. We can do more elaborate messaging once the other parts are in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(good conversation, no notes!)

@gshank gshank marked this pull request as ready for review March 23, 2023 20:06
core/dbt/exceptions.py Outdated Show resolved Hide resolved
sorted_columns = sorted(self.columns.values(), key=lambda col: col.name)
for column in sorted_columns:
contract_state += f"|{column.name}"
contract_state += str(column.data_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to leave out constraints for the scope of this issue 👍

We should revisit whether constraints should go in the checksum when we go to implement #7065, where:

  • breaking changes to constraints should raise a ModelContractError from same_contract
  • non-breaking changes to constraints (e.g. adding a constraint, removing an unenforced constraint) should return False from same_contract.

Determining the nature of a change to a constraint will need additional logic beyond comparing checksums, and adding constraints to the checksum will make understanding changes to column data_type and names a bit trickier. I could see the checksum being a useful to determine, crudely, whether there was a change at all, and needing to actually inspect the columns/constraints more closely to determine the type of change - breaking vs non-breaking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either checks columns individually or have two checksums in the checksum field, one for name/data_type and one for constraints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of putting these methods on CompiledNode versus carrying the columns on Contract and overriding __eq__() on Contract (in place of same_contract())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not prepared to move those fields at this time, since it would be a breaking change for packages which use those fields. In addition, not all of the attributes of a column count for breaking changes, so a simple equal doesn't work.

@gshank gshank requested a review from a team as a code owner March 28, 2023 17:52
@gshank gshank requested a review from Fleid March 28, 2023 17:52
@gshank gshank merged commit 050161c into main Mar 28, 2023
@gshank gshank deleted the ct-2038-contract_state_modified branch March 28, 2023 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-2038] Detect breaking changes to column names and data types in state:modified check
6 participants