Skip to content

Conversation

@caican00
Copy link
Contributor

@caican00 caican00 commented Aug 22, 2022

What changes were proposed in this pull request?

Traversable.toMap changed to collections.breakOut, that eliminates intermediate tuple collection creation.
I optimized it with reference to this pr:#18693
An introduction to Collections. BreakOut can be found at Stack Overflow article.

Why are the changes needed?

When DeserializeToObject is executed, converting Tuple2 to Scala Map via . ToMap takes a lot of cpu time.
image
image

How was this patch tested?

Unit tests run.
No performance tests performed yet.

@github-actions github-actions bot added the SQL label Aug 22, 2022
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caican00 mind creating a JIRA, and fix the PR title? See also https://spark.apache.org/contributing.html.

Also, we should probably fix it in the master branch instead of branch-3.3. Otherwise, looks pretty good

@caican00
Copy link
Contributor Author

@caican00 mind creating a JIRA, and fix the PR title? See also https://spark.apache.org/contributing.html.

Also, we should probably fix it in the master branch instead of branch-3.3. Otherwise, looks pretty good

thanks, i will close this pr and open a new pr to master branch

@caican00 caican00 changed the title update [SPARK-40175][SQL]Speed up conversion of Tuple2 to Scala Map Aug 22, 2022
@caican00 caican00 closed this Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants