-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect breaking changes to column names and data types in state:modified check #7216
Conversation
Please look at the exception message to see what you want. |
core/dbt/exceptions.py
Outdated
"There is a breaking change in the model contract; " | ||
"you may need to create a new version. See: https://docs.getdbt.com/docs/collaborate/publish/model-versions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user is aware they're making a breaking change, and just needs some guidance regarding making a new version of the model - this error message looks good to me too.
That said, I am worried that this error message could be a little opaque for unintented breaking changes, especially as our set of changes are considered breaking changes becomes more nuanced. For example, a user updates a contracted model with constraints from table
to view
in an attempt to optimize some spend, which raises a breaking change error because the existing constraints can't be validated. Just seeing There is a breaking change in the model contract
could be confusing, and perhaps the user would actually choose to just revert that change and make it at a later point (bundle it with some other breaking changes). In that scenario, I'd (as an imaginary user) love to see an error message that looks more like:
There is a breaking change in the model contract:
* Updated materialization from `table` to `view` for model with constraints
* Updated config from contract: true to contract: false
* Column a removed
* Column c updated data_type from string to int
* ...
You may need to create a new version. See: https://docs.getdbt.com/docs/collaborate/publish/model-versions
This may not be concern with our current definition of a breaking changes (any change to column name/data_type => breaking change), but given that we're planning to extend this to be more nuanced soon in #7065, I'd leave some design space for it if this scenario seems important to prioritize the UX of (cc @jtcohen6).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the change of a materialization is not a contract change, it's a config change. The ticket doesn't mention materialization -- do you want that to be part of the contract change too?
Creating that more user-friendly error message will probably be about twice as much work than everything else so far. If you want me to spend my time doing that, I can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Materializations are not part of this issue, but they will be part the work to extend detecting breaking changes in #7065 (which i still blocked by #7067). If we know we'll want more granular error messaging as part of that work, we could set up the foundations here or leave it to #7065.
I agree it's a good deal more work to get that though; we could break off improving the error messaging into a separate issue so it's refined and estimated appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a brief "reason" to the exception message. We can do more elaborate messaging once the other parts are in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(good conversation, no notes!)
sorted_columns = sorted(self.columns.values(), key=lambda col: col.name) | ||
for column in sorted_columns: | ||
contract_state += f"|{column.name}" | ||
contract_state += str(column.data_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to leave out constraints for the scope of this issue 👍
We should revisit whether constraints should go in the checksum when we go to implement #7065, where:
- breaking changes to constraints should raise a
ModelContractError
fromsame_contract
- non-breaking changes to constraints (e.g. adding a constraint, removing an unenforced constraint) should return
False
fromsame_contract
.
Determining the nature of a change to a constraint will need additional logic beyond comparing checksums, and adding constraints to the checksum will make understanding changes to column data_type and names a bit trickier. I could see the checksum being a useful to determine, crudely, whether there was a change at all, and needing to actually inspect the columns/constraints more closely to determine the type of change - breaking vs non-breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could either checks columns individually or have two checksums in the checksum field, one for name/data_type and one for constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of putting these methods on CompiledNode
versus carrying the columns on Contract
and overriding __eq__()
on Contract
(in place of same_contract()
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not prepared to move those fields at this time, since it would be a breaking change for packages which use those fields. In addition, not all of the attributes of a column count for breaking changes, so a simple equal doesn't work.
resolves #6869
Description
Add "same_contract", "state:modified.contract" to defer_state checks, create new exception and tests.
Checklist
changie new
to create a changelog entry