-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support enforcement of NOT NULL column declarations #4144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support enforcement of NOT NULL column declarations #4144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plan.getFieldMappings() is guaranteed to match the order of columns in plan.getScope().getRelationType() (i.e., the output "shape" of the query).
The planner is not the right place to match ColumnMetadata to fields, as there is no correspondence between those fields and the order in which the fields appear in the plan.
For instance, given a table t (a BIGINT, b BIGINT), the following query will see the fields in a different order: INSERT INTO t(b, a) VALUES (1, 10)
This should be handled during analysis. The analyzer should record which fields ordinals are supposed to be not null. Take a look at Analysis.JoinUsingAnalysis and callers for an example of how you might record that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll look there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint The two callers of this method in this class fetch the columns names from table metadata. Are they wrong as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CREATE TABLE AS ... it kind of works out because the metadata is derived from the CREATE TABLE statement. For INSERT INTO, it works because it creates a projection to match the layout of the table metadata. Unfortunately, it relies on projectNode.getOutputSymbols(), which is not guaranteed to come out in the desired order -- it just happens to work today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint, I believe I've addressed your concern in the latest force push of the commit. The code no longer expects channel order to match column order, and I added a unit test that exercises your INSERT INTO t(b, a) example
electrum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. I'll defer to @martint to review the planner changes
presto-iceberg/src/test/java/io/prestosql/plugin/iceberg/TestIcebergSmoke.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this inside the check for not-null channels, since this is logically an optimization of checking the block for nulls. For nullable channels, we don't need to look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest force push.
presto-main/src/main/java/io/prestosql/operator/TableWriterOperator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/TableWriterOperator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint The two callers of this method in this class fetch the columns names from table metadata. Are they wrong as well?
038cbb4 to
7c133be
Compare
presto-main/src/main/java/io/prestosql/sql/planner/plan/TableWriterNode.java
Outdated
Show resolved
Hide resolved
martint
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment, but the planner bits look good now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just do this:
| .map(column -> requireNonNull(columnToSymbolMap.get(column.getName()), "columnToSymbolMap is missing column " + column.getName())) | |
| .collect(Collectors.toSet()); | |
| .map(columnToSymbolMap::get) | |
| .collect(toImmutableSet()); |
It's more concise, and the immutable set will catch any nulls that might result from the lookup (which is a bug somewhere in the implementation, anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that's much nicer; changed as you suggested by a forced comment. Thanks very much for looking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this caused some tests to start failing with NullPointerException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, sorry, I missed the following in my suggestion above:
.map(ColumnMetadata::getName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, I should have looked more closely rather than blindly applying change. I need to be more careful pushing changes at an hour when I'm no longer sentient ;)
It's fixed now in the latest forced push.
7c133be to
c151591
Compare
This commit enforces NOT NULL column declarations on write in the Presto engine, so it applies to all connectors. The existing Postgres and Mysql tests named testInsertIntoNotNullColumn were changed to check for the new error message, and a new test with the same name was added to TestIcebergSmoke. One possible concern with this commit is that the error message issued by the Presto engine when writing a null to a NOT NULL column is a different message than the Connector might issue if no value was supplied for the NOT NULL column. I think this is ok, because the error messages supplied by the Connectors are completely specific to the Connector.
c151591 to
0674742
Compare
|
Thanks! |
This PR is back ported from Trino trinodb/trino#4144, originally authoered by djsstarburst. Quote description from Trino PR4144 "This commit enforces NOT NULL column declarations on write in the Presto engine, so it applies to all connectors. The existing Postgres and Mysql tests named testInsertIntoNotNullColumn were changed to check for the new error message, and a new test with the same name was added to TestIcebergSmoke. One possible concern with this commit is that the error message issued by the Presto engine when writing a null to a NOT NULL column is a different message than the Connector might issue if no value was supplied for the NOT NULL column. I think this is ok, because the error messages supplied by the Connectors are completely specific to the Connector." Because the gap between Presto and Trino, the original PR has to be modified. Cherry-pick of trinodb/trino#4144 ( trinodb/trino#4144) Co-authored-by: djsstarburst <[email protected]>
This PR is back ported from Trino trinodb/trino#4144, originally authoered by djsstarburst. Quote description from Trino PR4144 "This commit enforces NOT NULL column declarations on write in the Presto engine, so it applies to all connectors. The existing Postgres and Mysql tests named testInsertIntoNotNullColumn were changed to check for the new error message, and a new test with the same name was added to TestIcebergSmoke. One possible concern with this commit is that the error message issued by the Presto engine when writing a null to a NOT NULL column is a different message than the Connector might issue if no value was supplied for the NOT NULL column. I think this is ok, because the error messages supplied by the Connectors are completely specific to the Connector." Because the gap between Presto and Trino, the original PR has to be modified. Cherry-pick of trinodb/trino#4144 ( trinodb/trino#4144) Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
Original support was added via #2ad67dcf, but missed adding it for Iceberg tables Cherry-pick of trinodb/trino#4144 Co-authored-by: djsstarburst <[email protected]>
This commit enforces NOT NULL column declarations on write
in the Presto engine, so it applies to all connectors. The
existing Postgres and Mysql tests named testInsertIntoNotNullColumn
were changed to check for the new error message, and a new test
with the same name was added to TestIcebergSmoke.
One possible concern with this commit is that the error message
issued by the Presto engine when writing a null to a NOT NULL
column is a different message than the Connector might issue
if no value was supplied for the NOT NULL column. I think this
is ok, because the error messages supplied by the Connectors are
completely specific to the Connector.