-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 #46114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @minyyy @cloud-fan could you please take a look? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a regression at SPARK-38836, do we need to to fix this at branch-3.4, too, @wForget ?
|
cc @LuciferYang , @viirya , too |
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
| newSet | ||
| } | ||
|
|
||
| override def ++(elems: GenTraversableOnce[Expression]): ExpressionSet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment on the method about why we don't use the SetLike.default here? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, added.
I think it's needed. |
…cala 2.12 ### What changes were proposed in this pull request? Fix `ExpressionSet` performance regression in scala 2.12. ### Why are the changes needed? The implementation of the `SetLike.++` method in scala 2.12 is to iteratively execute the `+` method. The `ExpressionSet.+` method first clones a new object and then adds element, which is very expensive. https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186 After #36121, the `++` and `--` methods in ExpressionSet of scala 2.12 were removed, causing performance regression. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Benchmark code: ``` object TestBenchmark { def main(args: Array[String]): Unit = { val count = 300 val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count) val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1)) var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i)) val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 300).map(i => aUpper + i)) benchmark.addCase("Test ++", 10) { _: Int => for (_ <- 0L until count) { initialSet ++= setToAddWithSameDeterministicExpression } } benchmark.run() } } ``` before this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 1577 1691 61 0.0 5255516.0 1.0X ``` after this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 14 14 0 0.0 45395.2 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46114 from wForget/SPARK-47897. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Kent Yao <[email protected]>
…cala 2.12 ### What changes were proposed in this pull request? Fix `ExpressionSet` performance regression in scala 2.12. ### Why are the changes needed? The implementation of the `SetLike.++` method in scala 2.12 is to iteratively execute the `+` method. The `ExpressionSet.+` method first clones a new object and then adds element, which is very expensive. https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186 After #36121, the `++` and `--` methods in ExpressionSet of scala 2.12 were removed, causing performance regression. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Benchmark code: ``` object TestBenchmark { def main(args: Array[String]): Unit = { val count = 300 val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count) val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1)) var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i)) val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 300).map(i => aUpper + i)) benchmark.addCase("Test ++", 10) { _: Int => for (_ <- 0L until count) { initialSet ++= setToAddWithSameDeterministicExpression } } benchmark.run() } } ``` before this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 1577 1691 61 0.0 5255516.0 1.0X ``` after this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 14 14 0 0.0 45395.2 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46114 from wForget/SPARK-47897. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit afd99d1) Signed-off-by: Kent Yao <[email protected]>
|
Thank you @wForget, and @dongjoon-hyun @viirya @minyyy @cloud-fan Merged to '3.5.2', '3.4.4' |
|
Thanks for your fix @wForget But for the Could you please correct it in pr description? |
I guess you may be missing
I have added imports. |
Thanks. Could you please add this import to the benchmark code in the PR description? |
|
Thanks @wForget |
…cala 2.12 (apache#382) ### What changes were proposed in this pull request? Fix `ExpressionSet` performance regression in scala 2.12. ### Why are the changes needed? The implementation of the `SetLike.++` method in scala 2.12 is to iteratively execute the `+` method. The `ExpressionSet.+` method first clones a new object and then adds element, which is very expensive. https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186 After apache#36121, the `++` and `--` methods in ExpressionSet of scala 2.12 were removed, causing performance regression. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Benchmark code: ``` object TestBenchmark { def main(args: Array[String]): Unit = { val count = 300 val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count) val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1)) var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i)) val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 300).map(i => aUpper + i)) benchmark.addCase("Test ++", 10) { _: Int => for (_ <- 0L until count) { initialSet ++= setToAddWithSameDeterministicExpression } } benchmark.run() } } ``` before this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 1577 1691 61 0.0 5255516.0 1.0X ``` after this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 14 14 0 0.0 45395.2 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46114 from wForget/SPARK-47897. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Kent Yao <[email protected]> Co-authored-by: Zhen Wang <[email protected]>
What changes were proposed in this pull request?
Fix
ExpressionSetperformance regression in scala 2.12.Why are the changes needed?
The implementation of the
SetLike.++method in scala 2.12 is to iteratively execute the+method. TheExpressionSet.+method first clones a new object and then adds element, which is very expensive.https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186
After #36121, the
++and--methods in ExpressionSet of scala 2.12 were removed, causing performance regression.Does this PR introduce any user-facing change?
How was this patch tested?
Benchmark code:
before this change:
after this change:
Was this patch authored or co-authored using generative AI tooling?
No