[SPARK-39748][SQL][SS][FOLLOWUP] Fix a bug on column stat in LogicalRDD on mismatching exprIDs #37187

HeartSaVioR · 2022-07-14T03:46:46Z

What changes were proposed in this pull request?

This PR fixes a bug on #37161 (described the bug in below section) via making sure the output columns in LogicalRDD are always the same with output columns in originLogicalPlan in LogicalRDD, which is needed to inherit the column stats.

Why are the changes needed?

Stats for columns in originLogicalPlan refer to the columns in originLogicalPlan, which could be different from the columns in output of LogicalRDD in terms of expression ID.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT

…DD on mismatching exprIDs

HeartSaVioR · 2022-07-14T04:38:56Z

cc. @cloud-fan @viirya

viirya · 2022-07-14T05:28:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala


+    val rewrittenOriginLogicalPlan = originLogicalPlan.map { plan =>
+      val projectList = output.map { attr =>
+        Alias(attr, attr.name)(exprId = rewrite.getOrElse(attr, attr).exprId)


As rewrite is a map for all output. We already can get rewrite(attr) instead of getOrElse.

This is more about the sake of defensive programming - if there is a bug which makes the two set of columns be out of sync, we just allow them to be out of sync in future instead of failing the query, given that the impact of two set of columns be out of sync is not that quite serious, e.g. column stat won't be available. (vendors/3rd parties may still want to leverage it for major functionality though.)

In opposite way, I'm also in favor of fail-fast, setting the precondition that "two set of columns should be in sync", and assert the precondition on initialization of the class. After that we can safely assume that precondition is respected, and then it'd be safe to just use rewrite(attr) here.

~~I'm fine either way. WDYT? cc. @cloud-fan as well.~~

Okay for me. Just a nit comment.

No, not really. My bad you're right. It only looks into the output of LogicalRDD. (And this code wouldn't work in any way if there are out of sync between two sets of columns.)

Let me reflect the change.

HeartSaVioR · 2022-07-14T08:43:36Z

Would you mind about another round of review? I've just done with small-ish changes, but wanted to make sure the new changes are also OK before merging the PR.

dc7935d and 53ef820

Thanks in advance!

HeartSaVioR · 2022-07-14T08:54:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

    }.asInstanceOf[SortOrder])

+    val rewrittenOriginLogicalPlan = originLogicalPlan.map { plan =>
+      assert(output == plan.output, "The output columns are expected to the same for output " +


Actually I added this assertion on initialization as precondition, and realized canonicalization breaks the precondition. (output is canonicalized, but originLogicalPlan is not a target of canonicalization)

I wouldn't expect Spark calls newInstance against canonicalized node, but please correct me if I'm mistaken.

I think Spark doesn't call newInstance with canonicalized node. cc @cloud-fan

Yeah I can't think of any use case leveraging canonicalized node to even start with something.

HeartSaVioR · 2022-07-15T03:44:34Z

Thanks! Merging to master.

HeartSaVioR added 2 commits July 13, 2022 21:29

[SPARK-39748][SQL][SS][FOLLOWUP] Fix a bug on column stat in LogicalR…

dc9ca8d

…DD on mismatching exprIDs

reflect the suggestion from @cloud-fan

593181e

github-actions bot added SQL STRUCTURED STREAMING labels Jul 14, 2022

viirya approved these changes Jul 14, 2022

View reviewed changes

viirya reviewed Jul 14, 2022

View reviewed changes

cloud-fan approved these changes Jul 14, 2022

View reviewed changes

HeartSaVioR added 2 commits July 14, 2022 15:04

reflect feedback, assert precondition

dc7935d

fix - previous assertion does not work with canonicalization

53ef820

HeartSaVioR requested review from cloud-fan and viirya July 14, 2022 08:41

HeartSaVioR commented Jul 14, 2022

View reviewed changes

retrigger

b736c12

viirya approved these changes Jul 14, 2022

View reviewed changes

HeartSaVioR closed this in 3c80ed8 Jul 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39748][SQL][SS][FOLLOWUP] Fix a bug on column stat in LogicalRDD on mismatching exprIDs #37187

[SPARK-39748][SQL][SS][FOLLOWUP] Fix a bug on column stat in LogicalRDD on mismatching exprIDs #37187

Uh oh!

HeartSaVioR commented Jul 14, 2022

Uh oh!

HeartSaVioR commented Jul 14, 2022

Uh oh!

viirya Jul 14, 2022 •

edited

Loading

Uh oh!

HeartSaVioR Jul 14, 2022 •

edited

Loading

Uh oh!

viirya Jul 14, 2022

Uh oh!

HeartSaVioR Jul 14, 2022 •

edited

Loading

Uh oh!

HeartSaVioR commented Jul 14, 2022

Uh oh!

HeartSaVioR Jul 14, 2022

Uh oh!

viirya Jul 14, 2022

Uh oh!

HeartSaVioR Jul 15, 2022

Uh oh!

HeartSaVioR commented Jul 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-39748][SQL][SS][FOLLOWUP] Fix a bug on column stat in LogicalRDD on mismatching exprIDs #37187

[SPARK-39748][SQL][SS][FOLLOWUP] Fix a bug on column stat in LogicalRDD on mismatching exprIDs #37187

Uh oh!

Conversation

HeartSaVioR commented Jul 14, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR commented Jul 14, 2022

Uh oh!

viirya Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Jul 14, 2022

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jul 14, 2022

Uh oh!

HeartSaVioR Jul 14, 2022

Choose a reason for hiding this comment

Uh oh!

viirya Jul 14, 2022

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jul 15, 2022

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jul 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viirya Jul 14, 2022 •

edited

Loading

HeartSaVioR Jul 14, 2022 •

edited

Loading

HeartSaVioR Jul 14, 2022 •

edited

Loading