Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@
import org.junit.Before;
import org.junit.Test;

import java.nio.ByteOrder;

/**
* Test the RecordBinaryComparator, which compares two UnsafeRows by their binary form.
*/
Expand Down Expand Up @@ -261,40 +263,74 @@ public void testBinaryComparatorForNullColumns() throws Exception {
public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
int numFields = 1;

// Place the following bytes (hex) into UnsafeRows for the comparison:
//
// index | 00 01 02 03 04 05 06 07
// ------+------------------------
// row1 | 00 00 00 00 00 00 00 0b
// row2 | 00 00 00 00 80 00 00 0a
//
// The byte layout needs to be identical on all platforms regardless of
// of endianness. To achieve this the bytes in each value are reversed
// on little-endian platforms.
long row1Data = 11L;
long row2Data = 11L + Integer.MAX_VALUE;
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry is that: this changes what we were testing before. When we wrote the test, we assume the test is run in little-endian machines. Now we reverse the bytes for little-endian machines, and as a result the testing result also changes (result > 0 -> result < 0).

I'm not very sure about the intention of these 2 tests. Maybe they were wrong at the very beginning and we should reverse the bytes for testing. I'll leave it for others to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is perhaps the only remaining issue. We should fix the test that this sets up of course, but, which was the intended test? I'm also not 100% sure, but @mundaym had a reasonable argument at #29259 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, LGTM

row1Data = Long.reverseBytes(row1Data);
row2Data = Long.reverseBytes(row2Data);
}

UnsafeRow row1 = new UnsafeRow(numFields);
byte[] data1 = new byte[100];
row1.pointTo(data1, computeSizeInBytes(numFields * 8));
row1.setLong(0, 11);
row1.setLong(0, row1Data);

UnsafeRow row2 = new UnsafeRow(numFields);
byte[] data2 = new byte[100];
row2.pointTo(data2, computeSizeInBytes(numFields * 8));
row2.setLong(0, 11L + Integer.MAX_VALUE);
row2.setLong(0, row2Data);

insertRow(row1);
insertRow(row2);

Assert.assertTrue(compare(0, 1) > 0);
Assert.assertTrue(compare(0, 1) < 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results.

I think a simple way to fix the test is:

if (LITTLE_ENDIAN) {
  Assert.assertTrue(compare(0, 1) < 0);
} else {
  Assert.assertTrue(compare(0, 1) > 0);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also work here. I think it's a little less ideal, because then you are running a different test on little- vs big-endian (and those are the correct answers to the two different tests). I think only one of those tests was the intended one AFAICT, and I buy the argument that the proposed change restores it as well as addresses endianness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you are running a different test on little- vs big-endian

Yea it's true. But it's also true that this is what happens in the reality: the comparator compares bytes from left to right, no matter it's little- or big-endian.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. There is no endian-ness issue in the logic being tested. There is endian-ness in how this test constructs the bytes to compare, because it writes a long into memory. I doubt this test's purpose holds both ways? It "works" but either little- or big-endian machines aren't running the test as intended. It feels like there's one intended test to run here, not either of two.

}

@Test
public void testBinaryComparatorWhenSubtractionCanOverflowLongValue() throws Exception {
int numFields = 1;

// Place the following bytes (hex) into UnsafeRows for the comparison:
//
// index | 00 01 02 03 04 05 06 07
// ------+------------------------
// row1 | 80 00 00 00 00 00 00 00
// row2 | 00 00 00 00 00 00 00 01
//
// The byte layout needs to be identical on all platforms regardless of
// of endianness. To achieve this the bytes in each value are reversed
// on little-endian platforms.
long row1Data = Long.MIN_VALUE;
long row2Data = 1L;
if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, one of the tests does a comparison between
Long.MIN_VALUE and 1 in order to trigger an overflow condition that
existed in the past (i.e. Long.MIN_VALUE - 1). These constants
correspond to the values 0x80..00 and 0x00..01. However on a
little-endian machine the bytes in these values are now swapped
before they are compared. This means that we will now be comparing
0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
therefore missing the original purpose of the test.

I'm a bit confused by the PR description; I checked the original PR that added this test case and it seems like the overflow in the test title comes from the old code: https://github.com/apache/spark/pull/22101/files#diff-4ec35a60ad6a3f3f60f4d5ce91f59933L61-L63 To keep the original intention, why do you think we need to update the existing test in little-endian cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean, this is kind of a different issue -- yes. Should be a new JIRA. I'd summarize this as: the bytes that this test sets up and asserts about are different on big-endian. It creates the wrong test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the intention of this test is to perform the comparison at compare() regardless endianness.

For people, would it be good to add byte pattern to be tested.

80 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01

row1Data = Long.reverseBytes(row1Data);
row2Data = Long.reverseBytes(row2Data);
}

UnsafeRow row1 = new UnsafeRow(numFields);
byte[] data1 = new byte[100];
row1.pointTo(data1, computeSizeInBytes(numFields * 8));
row1.setLong(0, Long.MIN_VALUE);
row1.setLong(0, row1Data);

UnsafeRow row2 = new UnsafeRow(numFields);
byte[] data2 = new byte[100];
row2.pointTo(data2, computeSizeInBytes(numFields * 8));
row2.setLong(0, 1);
row2.setLong(0, row2Data);

insertRow(row1);
insertRow(row2);

Assert.assertTrue(compare(0, 1) < 0);
Assert.assertTrue(compare(0, 1) > 0);
}

@Test
Expand Down