Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a bug on #37161 (described the bug in below section) via making sure the output columns in LogicalRDD are always the same with output columns in originLogicalPlan in LogicalRDD, which is needed to inherit the column stats.

Why are the changes needed?

Stats for columns in originLogicalPlan refer to the columns in originLogicalPlan, which could be different from the columns in output of LogicalRDD in terms of expression ID.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT

@HeartSaVioR
Copy link
Contributor Author

cc. @cloud-fan @viirya


val rewrittenOriginLogicalPlan = originLogicalPlan.map { plan =>
val projectList = output.map { attr =>
Alias(attr, attr.name)(exprId = rewrite.getOrElse(attr, attr).exprId)
Copy link
Member

@viirya viirya Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As rewrite is a map for all output. We already can get rewrite(attr) instead of getOrElse.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more about the sake of defensive programming - if there is a bug which makes the two set of columns be out of sync, we just allow them to be out of sync in future instead of failing the query, given that the impact of two set of columns be out of sync is not that quite serious, e.g. column stat won't be available. (vendors/3rd parties may still want to leverage it for major functionality though.)

In opposite way, I'm also in favor of fail-fast, setting the precondition that "two set of columns should be in sync", and assert the precondition on initialization of the class. After that we can safely assume that precondition is respected, and then it'd be safe to just use rewrite(attr) here.

I'm fine either way. WDYT? cc. @cloud-fan as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for me. Just a nit comment.

Copy link
Contributor Author

@HeartSaVioR HeartSaVioR Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really. My bad you're right. It only looks into the output of LogicalRDD. (And this code wouldn't work in any way if there are out of sync between two sets of columns.)

Let me reflect the change.

@HeartSaVioR HeartSaVioR requested review from cloud-fan and viirya July 14, 2022 08:41
@HeartSaVioR
Copy link
Contributor Author

Would you mind about another round of review? I've just done with small-ish changes, but wanted to make sure the new changes are also OK before merging the PR.

dc7935d and 53ef820

Thanks in advance!

}.asInstanceOf[SortOrder])

val rewrittenOriginLogicalPlan = originLogicalPlan.map { plan =>
assert(output == plan.output, "The output columns are expected to the same for output " +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I added this assertion on initialization as precondition, and realized canonicalization breaks the precondition. (output is canonicalized, but originLogicalPlan is not a target of canonicalization)

I wouldn't expect Spark calls newInstance against canonicalized node, but please correct me if I'm mistaken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Spark doesn't call newInstance with canonicalized node. cc @cloud-fan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can't think of any use case leveraging canonicalized node to even start with something.

@HeartSaVioR
Copy link
Contributor Author

Thanks! Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants