Skip to content

[SPARK-13304] [SQL] Fix worst case of broadcast join of two ints#11188

Closed
davies wants to merge 1 commit intoapache:masterfrom
davies:fix_ints
Closed

[SPARK-13304] [SQL] Fix worst case of broadcast join of two ints#11188
davies wants to merge 1 commit intoapache:masterfrom
davies:fix_ints

Conversation

@davies
Copy link
Contributor

@davies davies commented Feb 12, 2016

If the two join columns have the same value, the hash code of them will be (a ^ b), which is 0, then the HashMap will be very very slow.

This PR will rotate the second int to avoid this case. In theory, it's still have the possibility that has lots of collisions, the pattern will be (1, 131072), (2, 131073) ... (n, n + 131072).

This PR also added some micro benchmark, and updated the results for broadcast hash joins.

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51201 has finished for PR 11188 at commit 1c0ee96.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies davies changed the title [SPARK-13314] [SQL] Fix worst case of broadcast join of two ints [SPARK-13304] [SQL] Fix worst case of broadcast join of two ints Feb 19, 2016
@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51571 has finished for PR 11188 at commit f6416a6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments