-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37517][SQL] Keep consistent order of columns with user specify for v1 table #34780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused that the table will be created success although column a is nullable. It seems to me that partition columns should not be nullable. @cloud-fan
200cd7f to
be72b42
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Test build #145855 has finished for PR 34780 at commit
|
|
To understand this issue better, today Spark reorders the user-specified schema in CREATE TABLE and always puts partition columns at the end? |
|
Kubernetes integration test status failure |
|
Test build #145858 has finished for PR 34780 at commit
|
| def partitionSchema: StructType = { | ||
| val partitionFields = schema.takeRight(partitionColumnNames.length) | ||
| val partitionFields = partitionColumnNames.map { partCol => | ||
| schema.find(_.name == partCol).get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe? Is there any Exception of None.get here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this consistent with the result of
partitionColumnNames.flatMap { partCol =>
schema.find(_.name == partCol)
}
?
| */ | ||
| def dataSchema: StructType = { | ||
| val dataFields = schema.dropRight(partitionColumnNames.length) | ||
| val dataFields = schema.filterNot { i => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i is easy to associate with index. Should we change this variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will change it. Thank you.
@cloud-fan |
|
I don't think we need to be limited by the underlying data source/hive metastore. We can always add an extra project to keep the original user-specified column order. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
partitionSchemaanddataSchemaimplementation.Why are the changes needed?
discuss at #34719.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add test case.