[SPARK-32163][SQL][3.0] Nested pruning should work even with cosmetic variations #29027

viirya · 2020-07-07T18:45:25Z

What changes were proposed in this pull request?

This patch proposes to deal with cosmetic variations when processing nested column extractors in NestedColumnAliasing. Currently if cosmetic variations are in the nested column extractors, the query is not optimized.

This backports #28988 to branch-3.0.

Why are the changes needed?

If the expressions extracting nested fields have cosmetic variations like qualifier difference, currently nested column pruning cannot work well.

For example, two attributes which are semantically the same, are referred in a query, but the nested column extractors of them are treated differently when we deal with nested column pruning.

Does this PR introduce any user-facing change?

Yes, fixing a bug in nested column pruning.

How was this patch tested?

Unit test.

maropu · 2020-07-07T23:05:11Z

Thanks for the backport, @viirya .

dongjoon-hyun · 2020-07-07T23:32:07Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala

        if (nestedFieldToAlias.nonEmpty &&
-            nestedFieldToAlias
-              .map { case (nestedField, _) => totalFieldNum(nestedField.dataType) }
+            nestedFields.map(_.canonicalized)


Is this the main difference from 3.0, dedupNestedFields -> nestedFields?

Yeah, for the change in NestedColumnAliasing. Another difference is test. One test in master branch cannot pass in branch-3.0.

Yes. Test part looked correct because it's a subset. For this part, it looks a little different and needs more validation.

Thanks for check. Yes, there is a bit difference between master and branch-3.0 here. So no dedupNestedFields in branch-3.0.

Ah, I missed this part. I think the added test still fails if we don't have this change. Is this correct?

Yes. It does. The new test case still validate this patch in terms of that part.

Yes, this test fails in current branch-3.0.

dongjoon-hyun · 2020-07-08T02:18:12Z

Merged to branch-3.0. Thank you, @viirya and @maropu .
All UTs (including R) already passed in the current running Jenkins.

… variations ### What changes were proposed in this pull request? This patch proposes to deal with cosmetic variations when processing nested column extractors in `NestedColumnAliasing`. Currently if cosmetic variations are in the nested column extractors, the query is not optimized. This backports #28988 to branch-3.0. ### Why are the changes needed? If the expressions extracting nested fields have cosmetic variations like qualifier difference, currently nested column pruning cannot work well. For example, two attributes which are semantically the same, are referred in a query, but the nested column extractors of them are treated differently when we deal with nested column pruning. ### Does this PR introduce _any_ user-facing change? Yes, fixing a bug in nested column pruning. ### How was this patch tested? Unit test. Closes #29027 from viirya/SPARK-32163-3.0. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

SparkQA · 2020-07-08T02:28:16Z

Test build #125245 has finished for PR 29027 at commit 5e4a420.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-07-08T04:11:54Z

Thanks @dongjoon-hyun @maropu

Nested pruning should work even with cosmetic variations.

5e4a420

probot-autolabeler bot added the SQL label Jul 7, 2020

dongjoon-hyun changed the title ~~[SPARK-32163][SQL][BRANCH-3.0] Nested pruning should work even with cosmetic variations~~ [SPARK-32163][SQL][3.0] Nested pruning should work even with cosmetic variations Jul 7, 2020

maropu approved these changes Jul 7, 2020

View reviewed changes

dongjoon-hyun reviewed Jul 7, 2020

View reviewed changes

dongjoon-hyun approved these changes Jul 8, 2020

View reviewed changes

dongjoon-hyun closed this Jul 8, 2020

viirya deleted the SPARK-32163-3.0 branch December 27, 2023 18:28

[SPARK-32163][SQL][3.0] Nested pruning should work even with cosmetic variations #29027

[SPARK-32163][SQL][3.0] Nested pruning should work even with cosmetic variations #29027

Uh oh!

Conversation

viirya commented Jul 7, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

maropu commented Jul 7, 2020

Uh oh!

dongjoon-hyun Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

viirya Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

viirya Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

maropu Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

maropu Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 8, 2020

Uh oh!

SparkQA commented Jul 8, 2020

Uh oh!

viirya commented Jul 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongjoon-hyun Jul 7, 2020 •

edited

Loading