-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite #29259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,8 @@ | |
| import org.junit.Before; | ||
| import org.junit.Test; | ||
|
|
||
| import java.nio.ByteOrder; | ||
|
|
||
| /** | ||
| * Test the RecordBinaryComparator, which compares two UnsafeRows by their binary form. | ||
| */ | ||
|
|
@@ -261,40 +263,74 @@ public void testBinaryComparatorForNullColumns() throws Exception { | |
| public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception { | ||
| int numFields = 1; | ||
|
|
||
| // Place the following bytes (hex) into UnsafeRows for the comparison: | ||
| // | ||
| // index | 00 01 02 03 04 05 06 07 | ||
| // ------+------------------------ | ||
| // row1 | 00 00 00 00 00 00 00 0b | ||
| // row2 | 00 00 00 00 80 00 00 0a | ||
| // | ||
| // The byte layout needs to be identical on all platforms regardless of | ||
| // of endianness. To achieve this the bytes in each value are reversed | ||
| // on little-endian platforms. | ||
| long row1Data = 11L; | ||
| long row2Data = 11L + Integer.MAX_VALUE; | ||
| if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { | ||
| row1Data = Long.reverseBytes(row1Data); | ||
| row2Data = Long.reverseBytes(row2Data); | ||
| } | ||
|
|
||
| UnsafeRow row1 = new UnsafeRow(numFields); | ||
| byte[] data1 = new byte[100]; | ||
| row1.pointTo(data1, computeSizeInBytes(numFields * 8)); | ||
| row1.setLong(0, 11); | ||
| row1.setLong(0, row1Data); | ||
|
|
||
| UnsafeRow row2 = new UnsafeRow(numFields); | ||
| byte[] data2 = new byte[100]; | ||
| row2.pointTo(data2, computeSizeInBytes(numFields * 8)); | ||
| row2.setLong(0, 11L + Integer.MAX_VALUE); | ||
| row2.setLong(0, row2Data); | ||
|
|
||
| insertRow(row1); | ||
| insertRow(row2); | ||
|
|
||
| Assert.assertTrue(compare(0, 1) > 0); | ||
| Assert.assertTrue(compare(0, 1) < 0); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results. I think a simple way to fix the test is:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would also work here. I think it's a little less ideal, because then you are running a different test on little- vs big-endian (and those are the correct answers to the two different tests). I think only one of those tests was the intended one AFAICT, and I buy the argument that the proposed change restores it as well as addresses endianness.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yea it's true. But it's also true that this is what happens in the reality: the comparator compares bytes from left to right, no matter it's little- or big-endian.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right. There is no endian-ness issue in the logic being tested. There is endian-ness in how this test constructs the bytes to compare, because it writes a long into memory. I doubt this test's purpose holds both ways? It "works" but either little- or big-endian machines aren't running the test as intended. It feels like there's one intended test to run here, not either of two. |
||
| } | ||
|
|
||
| @Test | ||
| public void testBinaryComparatorWhenSubtractionCanOverflowLongValue() throws Exception { | ||
| int numFields = 1; | ||
|
|
||
| // Place the following bytes (hex) into UnsafeRows for the comparison: | ||
| // | ||
| // index | 00 01 02 03 04 05 06 07 | ||
| // ------+------------------------ | ||
| // row1 | 80 00 00 00 00 00 00 00 | ||
| // row2 | 00 00 00 00 00 00 00 01 | ||
| // | ||
| // The byte layout needs to be identical on all platforms regardless of | ||
| // of endianness. To achieve this the bytes in each value are reversed | ||
| // on little-endian platforms. | ||
| long row1Data = Long.MIN_VALUE; | ||
| long row2Data = 1L; | ||
| if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the PR description; I checked the original PR that added this test case and it seems like the overflow in the test title comes from the old code: https://github.com/apache/spark/pull/22101/files#diff-4ec35a60ad6a3f3f60f4d5ce91f59933L61-L63 To keep the original intention, why do you think we need to update the existing test in little-endian cases?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you mean, this is kind of a different issue -- yes. Should be a new JIRA. I'd summarize this as: the bytes that this test sets up and asserts about are different on big-endian. It creates the wrong test.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, the intention of this test is to perform the comparison at For people, would it be good to add byte pattern to be tested. |
||
| row1Data = Long.reverseBytes(row1Data); | ||
| row2Data = Long.reverseBytes(row2Data); | ||
| } | ||
|
|
||
| UnsafeRow row1 = new UnsafeRow(numFields); | ||
| byte[] data1 = new byte[100]; | ||
| row1.pointTo(data1, computeSizeInBytes(numFields * 8)); | ||
| row1.setLong(0, Long.MIN_VALUE); | ||
| row1.setLong(0, row1Data); | ||
|
|
||
| UnsafeRow row2 = new UnsafeRow(numFields); | ||
| byte[] data2 = new byte[100]; | ||
| row2.pointTo(data2, computeSizeInBytes(numFields * 8)); | ||
| row2.setLong(0, 1); | ||
| row2.setLong(0, row2Data); | ||
|
|
||
| insertRow(row1); | ||
| insertRow(row2); | ||
|
|
||
| Assert.assertTrue(compare(0, 1) < 0); | ||
| Assert.assertTrue(compare(0, 1) > 0); | ||
| } | ||
|
|
||
| @Test | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry is that: this changes what we were testing before. When we wrote the test, we assume the test is run in little-endian machines. Now we reverse the bytes for little-endian machines, and as a result the testing result also changes (
result > 0->result < 0).I'm not very sure about the intention of these 2 tests. Maybe they were wrong at the very beginning and we should reverse the bytes for testing. I'll leave it for others to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is perhaps the only remaining issue. We should fix the test that this sets up of course, but, which was the intended test? I'm also not 100% sure, but @mundaym had a reasonable argument at #29259 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, LGTM