Skip to content

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Jul 9, 2020

What changes were proposed in this pull request?

This PR removes the empty child relations of a Union.

E.g. the query SELECT c FROM t UNION ALL SELECT c FROM t WHERE false has the following plan before this PR:

== Physical Plan ==
Union
:- *(1) Project [value#219 AS c#222]
:  +- *(1) LocalTableScan [value#219]
+- LocalTableScan <empty>, [c#224]

and after this PR:

== Physical Plan ==
*(1) Project [value#219 AS c#222]
+- *(1) LocalTableScan [value#219]

Why are the changes needed?

To have a simpler plan.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new UTs.

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125472 has started for PR 29053 at commit 71b28fd.

comparePlans(optimized, correctAnswer)
}

test("remove empty relation children from Union") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is an improvement, shall we add SPARK-32241: ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@dongjoon-hyun
Copy link
Member

Also, cc @cloud-fan

if (newChildren.isEmpty) {
empty(p)
} else {
val newFirstChild = if (newChildren.head eq children.head) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use sameOutput like this;

      if (newChildren.isEmpty) {
        empty(p)
      } else {
        val newPlan = if (newChildren.size > 1) Union(newChildren) else newChildren.head
        // Describes why we need this handling
        if (newPlan.sameOutput(p)) {
          newPlan
        } else {
          val outputAliases = newPlan.output.zip(p.output).map { case (newAttr, outputName) =>
            Alias(newAttr, outputName.name)(newAttr.exprId)
          }
          Project(outputAliases, newPlan)
        }
      }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a good idea, but sameOutput uses semanticEquals that fails if an expression is non-deterministic. How about just comparing the expression ids like this?

     if (newChildren.isEmpty) {
        empty(p)
      } else {
        val newPlan = if (newChildren.size > 1) Union(newChildren) else newChildren.head
        val outputs = newPlan.output.zip(p.output)
        // the original Union may produce different output attributes than the new one so we alias them if needed
        if (outputs.forall { case (newAttr, oldAttr) => newAttr.exprId == oldAttr.exprId }) {
          newPlan
        } else {
          val outputAliases = outputs.map { case (newAttr, oldAttr) =>
            Alias(newAttr, oldAttr.name)(oldAttr.exprId)
          }
          Project(outputAliases, newPlan)
        }
      }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to my version in the latest commit. @maropu please let me know if yours with sameOutput would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, your version looks okay.

@maropu
Copy link
Member

maropu commented Jul 10, 2020

also cc: @viirya

@HyukjinKwon
Copy link
Member

also @maryannxue :-)

Comment on lines 61 to 65
val pl = children.head.output.zip(newChildren.head.output).map {
case (oa, na) => Alias(na, oa.name)(oa.exprId)
}
Project(pl, newChildren.head)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this may change output nullability of original Union?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's corrected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project looks correct. Just the new Union's output nullability might change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a stupid question, but is that an issue? It can change in one direction, from nullable to non-nullable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be an issue. But the nullability updates may not be effective unless we run the rule UpdateAttributeNullability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I see. So shall we run UpdateAttributeNullability after PropagateEmptyRelation in the Optimizer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently UpdateAttributeNullability is put in a Once batch. We need to evaluate if it's OK (idempotent and cost-efficient) to run it in a fix-point batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is idempotent and doesn't do costly things just a simple bottom-up traversal on nodes and one expression traversal per node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added UpdateAttributeNullability after PropagateEmptyRelation in the latest commit.

comparePlans(optimized, correctAnswer)
}

test("SPARK-32241: remove empty relation children from Union") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test to check the nullability update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple test case like here: b454cd7 is ok? Or you want me to test that UpdateAttributeNullability kicks in and fixes some nullability after PropagateEmptyRelation?

Copy link
Contributor

@jinhai-cloud jinhai-cloud Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case does not apply the UpdateAttributeNullability rules in Optimizer. Do you need to add a test case?

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125610 has finished for PR 29053 at commit f9c0a3d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125623 has started for PR 29053 at commit b454cd7.

Comment on lines 164 to 165
PropagateEmptyRelation,
UpdateAttributeNullability) ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment above explaining why we run UpdateAttributeNullability here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added in the latest commit.

@SparkQA
Copy link

SparkQA commented Jul 11, 2020

Test build #125630 has finished for PR 29053 at commit e10fe02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125720 has finished for PR 29053 at commit e10fe02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

newPlan
} else {
val outputAliases = outputs.map { case (newAttr, oldAttr) =>
Alias(newAttr, oldAttr.name)(oldAttr.exprId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is introducing a new Alias, I'm wondering if we need to worry about explicitMetadata in this case.

Copy link
Contributor Author

@peter-toth peter-toth Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we can lose it in some cases. Fixed in cea0a48

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125767 has finished for PR 29053 at commit 05a90ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 24be816 Jul 14, 2020
@peter-toth
Copy link
Contributor Author

Thanks for the review.

@dongjoon-hyun
Copy link
Member

Thank you, all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants