Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,27 @@ public static void freeMemory(long address) {

public static void copyMemory(
Object src, long srcOffset, Object dst, long dstOffset, long length) {
while (length > 0) {
long size = Math.min(length, UNSAFE_COPY_THRESHOLD);
_UNSAFE.copyMemory(src, srcOffset, dst, dstOffset, size);
length -= size;
srcOffset += size;
dstOffset += size;
// Check if dstOffset is before or after srcOffset to determine if we should copy
// forward or backwards. This is necessary in case src and dst overlap.
if (dstOffset < srcOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some benchmarks, there is no performance difference between chunked copy than single _UNSAFE.copyMemory, agree with @srowen , single call to _UNSAFE.copyMemory should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting at something different -- does _UNSAFE.copyMemory work reliably with overlapping memory regions? I hope so, I imagine so. But this method's correctness still depends on it, so it's probably worth unit-testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies I don't think it was chunked this way for perf. Looking at the comment for UNSAFE_COPY_THRESHOLD, it seems to be for some other purpose. I'm not familiar with this enough to judge if we need that.

@srowen I'm looking to see if I can find docs on what unsafe does but haven't found anything definitive yet. The test case you originally suggested is very similar to what I have no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm i don't think the copy block by block is done to improve performance. I believe it was done to allow safe points when copying a very large region.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a question. Both dstOffset and srcOffset are offsets. They are not absolute addresses. Based on the values of dstOffset and srcOffset, we are unable to know if source is behind target, right?

Based on the code changes, I do not understand how it works. Could anybody give me a hint? Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies Thank you for your explanation! The code changes look confusing for the C/C++ people, I think. Do you think we should add the comments for clarifying your point for better readability?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies copyMemory is a general function. How can we know they are from the same object or not? Like in the following function putNewKey, the function caller has to guarantee they do not break our hidden assumptions when calling the function copyMemory, right?

I am just afraid the other coders might not realize we have such an assumption in this low-level general function copyMemory. Let me know if my concern is valid. Thanks!

      // --- Append the key and value data to the current data page --------------------------------
      final Object base = currentPage.getBaseObject();
      long offset = currentPage.getBaseOffset() + pageCursor;
      final long recordOffset = offset;
      Platform.putInt(base, offset, keyLength + valueLength + 4);
      Platform.putInt(base, offset + 4, keyLength);
      offset += 8;
      Platform.copyMemory(keyBase, keyOffset, base, offset, keyLength);
      offset += keyLength;
      Platform.copyMemory(valueBase, valueOffset, base, offset, valueLength);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "hidden assumptions"? The src and dst can be same object or not, the offset can overlap or not, we don't have a limitation for copyMemory right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JVM, two objects (or memory block) can't overlap each other (assuming the two offsets are in valid range). The only case that src and dst could overlap is they are exactly the same value (pointer), so we can only compare the offsets.

After this patch, we can use copyMemory() without worrying about the src and dst are overlapped or not, it always works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, I see. Thank you very much!

while (length > 0) {
long size = Math.min(length, UNSAFE_COPY_THRESHOLD);
_UNSAFE.copyMemory(src, srcOffset, dst, dstOffset, size);
length -= size;
srcOffset += size;
dstOffset += size;
}
} else {
srcOffset += length;
dstOffset += length;
while (length > 0) {
long size = Math.min(length, UNSAFE_COPY_THRESHOLD);
srcOffset -= size;
dstOffset -= size;
_UNSAFE.copyMemory(src, srcOffset, dst, dstOffset, size);
length -= size;
}

}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.unsafe;

import org.junit.Assert;
import org.junit.Test;

public class PlatformUtilSuite {

@Test
public void overlappingCopyMemory() {
byte[] data = new byte[3 * 1024 * 1024];
int size = 2 * 1024 * 1024;
for (int i = 0; i < data.length; ++i) {
data[i] = (byte)i;
}

Platform.copyMemory(data, Platform.BYTE_ARRAY_OFFSET, data, Platform.BYTE_ARRAY_OFFSET, size);
for (int i = 0; i < data.length; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, this is a good test. I missed this earlier. Nits: casts usually have a space afterwards in Java, and i++ is more idiomatic for incrementing. I would not change it unless you have to make other changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old habits die hard :)

I feel pretty good that if this test case passes then unsafe.copyMemory works. There is a question of whether this behavior is unspecified and can between different jvms.

Assert.assertEquals((byte)i, data[i]);
}

Platform.copyMemory(
data,
Platform.BYTE_ARRAY_OFFSET + 1,
data,
Platform.BYTE_ARRAY_OFFSET,
size);
for (int i = 0; i < size; ++i) {
Assert.assertEquals((byte)(i + 1), data[i]);
}

for (int i = 0; i < data.length; ++i) {
data[i] = (byte)i;
}
Platform.copyMemory(
data,
Platform.BYTE_ARRAY_OFFSET,
data,
Platform.BYTE_ARRAY_OFFSET + 1,
size);
for (int i = 0; i < size; ++i) {
Assert.assertEquals((byte)i, data[i + 1]);
}
}
}