-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Validate table columns don't conflict with metadata columns #3456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Validate table columns don't conflict with metadata columns #3456
Conversation
| return result; | ||
| } | ||
|
|
||
| public static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did this for Spark only. It may make sense to move it to DataTableScan in core.
|
Looks good to me. I think this is going to be rare enough that we won't need another solution. |
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment but this looks good to me. Thanks @aokolnychyi!
| table.updateSchema() | ||
| .addColumn(MetadataColumns.SPEC_ID.name(), Types.IntegerType.get()) | ||
| .addColumn(MetadataColumns.FILE_PATH.name(), Types.StringType.get()) | ||
| .commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, would it make sense to fail here, in the updateSchema pathway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. I would like to avoid not allowing these names, but you're right that it would catch the problem earlier.
This PR is a follow-up to #3373 and throws an exception when table columns conflict with metadata columns during reads.