[SPARK-30413][SQL] Avoid WrappedArray roundtrip in GenericArrayData constructor, plus related optimization in ParquetMapConverter#27088
Conversation
|
Test build #116082 has finished for PR 27088 at commit
|
Can you put performance numbers in the PR description if possible? |
|
Thank you for pinging me, @maropu . |
|
Hi, @JoshRosen . Could you address @maropu 's comment? |
|
Let me take a look at this today / check my notes (busy week for me, hence
my slow response).
…On Fri, Jan 17, 2020 at 8:28 AM Dongjoon Hyun ***@***.***> wrote:
***@***.**** approved this pull request.
+1, LGTM. (except @maropu <https://github.com/maropu> 's comment.)
If it's difficult, we can remove the performance related stuff from the PR
description.
cc @dbtsai <https://github.com/dbtsai>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27088?email_source=notifications&email_token=AAAMMPBQWZ5KQ64TCRB5C3DQ6HMBBA5CNFSM4KCKVKPKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCSFZ6NQ#pullrequestreview-344694582>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMPF4LZM2J4SFRYAEAODQ6HMBBANCNFSM4KCKVKPA>
.
|
|
Thanks, @JoshRosen . |
…rayData-optimization
… ParquetRowConverter's map converter
|
@kiszk, @maropu, @dongjoon-hyun, thank you all for reviewing. I've pushed a commit adding benchmarks and another to fix a pre-existing performance problem in |
| arrayOfAnyAsSeq 25 27 2 398.0 2.5 0.3X | ||
| arrayOfInt 613 630 15 16.3 61.3 0.0X | ||
| arrayOfIntAsObject 866 872 8 11.5 86.6 0.0X | ||
|
|
There was a problem hiding this comment.
Here are the JDK8 benchmark results before this patch's changes:
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
[info] constructor: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] arrayOfAny 8 12 4 1281.5 0.8 1.0X
[info] arrayOfAnyAsObject 531 605 76 18.8 53.1 0.0X
[info] arrayOfAnyAsSeq 26 35 20 384.4 2.6 0.3X
[info] arrayOfInt 703 779 117 14.2 70.3 0.0X
[info] arrayOfIntAsObject 1336 1397 86 7.5 133.6 0.0X
|
Test build #116990 has finished for PR 27088 at commit
|
|
jenkins retest this please |
There was a problem hiding this comment.
+1, LGTM. Thank you, @JoshRosen . The new result looks impressive.
(If you don't mind, you can run the benchmark on JDK11 in order to generate another result file for JDK11. It's okay to skip in this PR.)
| arrayOfAnyAsSeq 5 6 2 2195.5 0.5 1.2X | ||
| arrayOfInt 452 469 13 22.1 45.2 0.0X | ||
| arrayOfIntAsObject 678 690 11 14.7 67.8 0.0X | ||
|
|
There was a problem hiding this comment.
Here are the JDK11 benchmark results before this patch's changes:
[info] OpenJDK 64-Bit Server VM 11.0.5+10 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
[info] constructor: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] arrayOfAny 6 6 1 1776.5 0.6 1.0X
[info] arrayOfAnyAsObject 374 386 9 26.7 37.4 0.0X
[info] arrayOfAnyAsSeq 5 5 0 2211.1 0.5 1.2X
[info] arrayOfInt 472 494 29 21.2 47.2 0.0X
[info] arrayOfIntAsObject 798 799 1 12.5 79.8 0.0X
There was a problem hiding this comment.
A pleasant discovery is that this patch has an even larger positive impact on JDK11: on JDK11, this PR's changes seem to more-or-less completely eliminate the performance gap between the this(Any) and this(Array[Any]) constructors 🎉
|
Test build #117019 has finished for PR 27088 at commit
|
|
Merged to master. (The last one is only adding the JDK11 generated result.) |
|
Test build #117032 has finished for PR 27088 at commit
|
What changes were proposed in this pull request?
This PR implements a tiny performance optimization for a
GenericArrayDataconstructor, avoiding an unnecessary roundtrip throughWrappedArraywhen the provided value is already an array of objects.It also fixes a related performance problem in
ParquetRowConverter.Why are the changes needed?
GenericArrayDatahas athis(seqOrArray: Any)constructor, which was originally added in #13138 for use inRowEncoder(where we may not know concrete types until runtime) but is also called (perhaps unintentionally) in a few other code paths.In this constructor's existing implementation, a call to
new WrappedArray(Array[Object](""))is dispatched to thethis(seqOrArray: Any)constructor, where we then callthis(array.toSeq): this wraps the provided array into aWrappedArray, which is subsequently unwrapped in athis(seq.toArray)call. For an interactive example, see https://scastie.scala-lang.org/7jOHydbNTaGSU677FWA8nAThis PR changes the
this(seqOrArray: Any)constructor so that it calls the primarythis(array: Array[Any])constructor, allowing us to save a.toSeq.toArraycall; this comes at the cost of one additionalcasein thematchstatement (but I believe this has a negligible performance impact relative to the other savings).As code cleanup, I also reverted the JVM 1.7 workaround from #14271.
I also fixed a related performance problem in
ParquetRowConverter: previously, this code calledArrayBasedMapData.applywhich, in turn, called thethis(Any)constructor forGenericArrayData: this PR's micro-benchmarks show that this is significantly slower than calling thethis(Array[Any])constructor (and I also observed time spent here during other Parquet scan benchmarking work). To fix this performance problem, I replaced the call to theArrayBasedMapData.applymethod with direct calls to theArrayBasedMapDataandGenericArrayDataconstructors.Does this PR introduce any user-facing change?
No.
How was this patch tested?
I tested this by running code in a debugger and by running microbenchmarks (which I've added to a new
GenericArrayDataBenchmarkin this PR):this(Any)constructor. Even after improvements, however, calls to thethis(Array[Any])constructor are still ~60x faster than calls tothis(Any)when passing a non-primitive array (thereby motivating this patch's other change inParquetRowConverter).this(Any)constructor.