[Spark-21807][SQL]Override ++ operation in ExpressionSet to reduce clone time#19022
[Spark-21807][SQL]Override ++ operation in ExpressionSet to reduce clone time#19022eatoncys wants to merge 10 commits intoapache:masterfrom
Conversation
| } | ||
| } | ||
|
|
||
| def addMultiExpressions(elems: Set[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
GenTraversableOnce instead of Set?
| } | ||
| } | ||
|
|
||
| def addMultiExpressions(elems: Set[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
Override ++ instead of addMultiExpressions? As it is a Set, we better follow Set's semantics.
There was a problem hiding this comment.
Ok, Modified it, thanks.
| case a @ Alias(e, _) => | ||
| // For every alias in `projectList`, replace the reference in constraints by its attribute. | ||
| allConstraints ++= allConstraints.map(_ transform { | ||
| val replacedElement = allConstraints.map(_ transform { |
There was a problem hiding this comment.
By overriding ++, we don't need to change those.
| } | ||
| } | ||
|
|
||
| def addMultiExpressions(elems: Set[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
Let's add a related test in ExpressionSetSuite.
There was a problem hiding this comment.
Ok, Added, thanks.
|
|
||
| val expressions = aggPlan.validConstraints | ||
| // The size of Aliased expression is n * (n - 1) / 2 + n | ||
| assert( expressions.size === expressionNum * (expressionNum - 1) / 2 + expressionNum) |
There was a problem hiding this comment.
Relying on the size of validConstraints may be fragile. I think we only need to test if the new API of ExpressionSet works. There should be other constraints test to make sure we produce correct constraints.
|
|
||
| } | ||
|
|
||
| test("add multiple elements to set") { |
There was a problem hiding this comment.
This test case doesn't fail without the above code, does it?
There was a problem hiding this comment.
Yap, previous ++ works without problem. The override version goes to reduce the time cost. This test is just used to make sure we don't mess up behavior of ++.
| package org.apache.spark.sql.catalyst.expressions | ||
|
|
||
| import scala.collection.mutable | ||
| import scala.collection.{GenTraversableOnce, mutable} |
There was a problem hiding this comment.
You can do style check by running dev/scalastyle.
There was a problem hiding this comment.
Ok, modified it, thanks.
|
We can edit the PR title and description to show the changes clearly. |
| newSet | ||
| } | ||
|
|
||
| override def -(elem: Expression): ExpressionSet = { |
There was a problem hiding this comment.
I guess we don't use -- frequently or in hot code path. We may add it when we need it.
|
cc @gatorsmile @HyukjinKwon May you help to trigger Jenkins tests? Thanks. |
|
ok to test |
|
Test build #81035 has finished for PR 19022 at commit
|
|
add to whitelist |
|
Test build #81039 has finished for PR 19022 at commit
|
| test("add multiple elements to set") { | ||
| val initialSet = ExpressionSet(aUpper + 1 :: Nil) | ||
| val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil) | ||
| val setToAdd = ExpressionSet(aUpper + 2 :: aUpper + 3 :: Nil) |
There was a problem hiding this comment.
Not very sure what you want to test in the second one. Are you want to test the behavior of adding a set of expressions that do not exist in the initial set?
-> ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil)?
There was a problem hiding this comment.
Yes, I have modified the name to setToAddWithOutSameExpression.
|
Test build #81057 has finished for PR 19022 at commit
|
| test("add multiple elements to set") { | ||
| val initialSet = ExpressionSet(aUpper + 1 :: Nil) | ||
| val setToAddWithSameExpression = ExpressionSet(aUpper + 1 :: aUpper + 2 :: Nil) | ||
| val setToAddWithOutSameExpression = ExpressionSet(aUpper + 3 :: aUpper + 4 :: Nil) |
|
LGTM |
|
Thanks! Merging to master. You can fix this in your future PRs. |
What changes were proposed in this pull request?
The getAliasedConstraints fuction in LogicalPlan.scala will clone the expression set when an element added,
and it will take a long time. This PR add a function to add multiple elements at once to reduce the clone time.
Before modified, the cost of getAliasedConstraints is:
100 expressions: 41 seconds
150 expressions: 466 seconds
After modified, the cost of getAliasedConstraints is:
100 expressions: 1.8 seconds
150 expressions: 6.5 seconds
The test is like this:
test("getAliasedConstraints") {
val expressionNum = 150
val aggExpression = (1 to expressionNum).map(i => Alias(Count(Literal(1)), s"cnt$i")())
val aggPlan = Aggregate(Nil, aggExpression, LocalRelation())
}
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Run new added test.
Please review http://spark.apache.org/contributing.html before opening a pull request.