Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 0 additions & 10 deletions common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
Original file line number Diff line number Diff line change
Expand Up @@ -174,11 +174,6 @@ public static float getFloat(Object object, long offset) {
}

public static void putFloat(Object object, long offset, float value) {
if (Float.isNaN(value)) {
value = Float.NaN;
} else if (value == -0.0f) {
value = 0.0f;
}
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These change are expected to cause the following test case failure in PlatformUtilSuite, but it seems to be missed. Could you fix the test case or remove it together, @cloud-fan ?

_UNSAFE.putFloat(object, offset, value);
}

Expand All @@ -187,11 +182,6 @@ public static double getDouble(Object object, long offset) {
}

public static void putDouble(Object object, long offset, double value) {
if (Double.isNaN(value)) {
value = Double.NaN;
} else if (value == -0.0d) {
value = 0.0d;
}
_UNSAFE.putDouble(object, offset, value);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,11 +198,45 @@ protected final void writeLong(long offset, long value) {
Platform.putLong(getBuffer(), offset, value);
}

// We need to take care of NaN and -0.0 in several places:
// 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be
// treated as same.
// 2. In range partitioner, different NaNs should belong to the same partition, -0.0 and 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have related test for case 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out this is not a problem. The doc of RangePartitioning is misleading. I'm updating the doc at #23249

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not a problem, we should update the PR description too.

// should belong to the same partition.
// 3. In GROUP BY, different NaNs should belong to the same group, -0.0 and 0.0 should belong
// to the same group.
// 4. As join keys, different NaNs should be treated as same, `-0.0` and `0.0` should be
// treated as same.
//
// Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we
// recursively compare the fields/elements, so it's also fine.
//
// Case 2 is problematic, as the sorter of range partitioner uses prefix comparator, which
// thinks 0.0 > -0.0.
//
// Case 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs
// have different binary representation, and the same thing happens for -0.0 and 0.0.
//
// Here we normalize NaN and -0.0, so that we don't need to care about NaN and -0.0 for
// `UnsafeRow`s created by `UnsafeProjection`. It fixes case 2, because the input of the sorter
// of the range partitioner are `UnsafeRow`s created by `UnsafeProjection`. It also fixes case 3
// and 4, because Spark uses `UnsafeProjection` to extract join keys/grouping keys.
protected final void writeFloat(long offset, float value) {
if (Float.isNaN(value)) {
value = Float.NaN;
} else if (value == -0.0f) {
value = 0.0f;
}
Platform.putFloat(getBuffer(), offset, value);
}

// See comments for `writeFloat`.
protected final void writeDouble(long offset, double value) {
if (Double.isNaN(value)) {
value = Double.NaN;
} else if (value == -0.0d) {
value = 0.0d;
}
Platform.putDouble(getBuffer(), offset, value);
}
}