Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Jul 30, 2015

As of today, StringPrefixComparator converts the long values back to byte arrays in order to compare them. This patch optimizes this to compare the longs directly, rather than turning the longs into byte arrays and comparing them byte by byte (unsigned).

This only works on little-endian architecture right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tellison can you take a look at this (or find somebody else) to see if we can fix it by just not doing the reverse bytes in big-endian?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the numBytes here, or get random bytes if it's less than 8 bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test the endian-ness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created https://github.com/apache/spark/pull/7789/files to make it safer. Still doesn't handle big endian though.

@rxin
Copy link
Contributor Author

rxin commented Jul 30, 2015

cc @davies / @JoshRosen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @JoshRosen

I made the randomized tester to be stricter. Previously it would still pass if prefix always return 0.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38943 has finished for PR 7765 at commit 4c8d094.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38950 has finished for PR 7765 at commit e4908cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 07fd7d3 Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants