-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40494][CORE][SQL][ML][MLLIB] Optimize the performance of keys.zipWithIndex.toMap code pattern
#37940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test the following code with input size result as follows: Java 8 from bench results:
|
keys.zipWithIndex.toMap code pattern keys.zipWithIndex.toMap code pattern
|
friendly ping @cloud-fan |
| /** | ||
| * Same function as `keys.zipWithIndex.toMap`, but has perf gain. | ||
| */ | ||
| def toMap[K](keys: Iterable[K]): Map[K, Int] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toMapWithIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
405c625 fix this, waiting ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GA passed
|
thanks, merging to master! |
|
thanks @cloud-fan |
What changes were proposed in this pull request?
Similar as #37876, this pr introduce a new
toMapWithIndexmethod too.a.spark.util.collection.Utils, usewhile loop manually styleto optimize the performance ofkeys.zipWithIndex.toMapcode pattern in Spark.Why are the changes needed?
Performance improvement
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GitHub Actions