-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32163][SQL][3.0] Nested pruning should work even with cosmetic variations #29027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the backport, @viirya . |
| if (nestedFieldToAlias.nonEmpty && | ||
| nestedFieldToAlias | ||
| .map { case (nestedField, _) => totalFieldNum(nestedField.dataType) } | ||
| nestedFields.map(_.canonicalized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the main difference from 3.0, dedupNestedFields -> nestedFields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for the change in NestedColumnAliasing. Another difference is test. One test in master branch cannot pass in branch-3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Test part looked correct because it's a subset. For this part, it looks a little different and needs more validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for check. Yes, there is a bit difference between master and branch-3.0 here. So no dedupNestedFields in branch-3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed this part. I think the added test still fails if we don't have this change. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It does. The new test case still validate this patch in terms of that part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this test fails in current branch-3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
… variations ### What changes were proposed in this pull request? This patch proposes to deal with cosmetic variations when processing nested column extractors in `NestedColumnAliasing`. Currently if cosmetic variations are in the nested column extractors, the query is not optimized. This backports #28988 to branch-3.0. ### Why are the changes needed? If the expressions extracting nested fields have cosmetic variations like qualifier difference, currently nested column pruning cannot work well. For example, two attributes which are semantically the same, are referred in a query, but the nested column extractors of them are treated differently when we deal with nested column pruning. ### Does this PR introduce _any_ user-facing change? Yes, fixing a bug in nested column pruning. ### How was this patch tested? Unit test. Closes #29027 from viirya/SPARK-32163-3.0. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
Test build #125245 has finished for PR 29027 at commit
|
|
Thanks @dongjoon-hyun @maropu |
What changes were proposed in this pull request?
This patch proposes to deal with cosmetic variations when processing nested column extractors in
NestedColumnAliasing. Currently if cosmetic variations are in the nested column extractors, the query is not optimized.This backports #28988 to branch-3.0.
Why are the changes needed?
If the expressions extracting nested fields have cosmetic variations like qualifier difference, currently nested column pruning cannot work well.
For example, two attributes which are semantically the same, are referred in a query, but the nested column extractors of them are treated differently when we deal with nested column pruning.
Does this PR introduce any user-facing change?
Yes, fixing a bug in nested column pruning.
How was this patch tested?
Unit test.