-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25776][CORE]The disk write buffer size must be greater than 12 #22754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #97484 has finished for PR 22754 at commit
|
|
Good catch. My suggestion is to create a JIRA entry. |
|
If we set 12 into this, |
|
@kiszk Thanks,I will create a JIRA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refine this comment to explain more than 12 bytes are required?
For example, space used by prefix + len + recordLength is more than 4 + 8 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,thanks
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
|
Thank you for your clarification. |
|
Thank you for your review, I will update it @kiszk |
59bf755 to
a0d36c7
Compare
|
Test build #97689 has started for PR 22754 at commit |
|
Test build #97693 has finished for PR 22754 at commit
|
|
Test build #97696 has finished for PR 22754 at commit
|
|
Test build #97758 has finished for PR 22754 at commit
|
|
Test build #97763 has finished for PR 22754 at commit
|
|
Test build #97776 has started for PR 22754 at commit |
|
Test build #97770 has started for PR 22754 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert shouldn't be used to check arguments in public APIs. But, despite its visibility I'm not sure if this is really a public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure too, but I see many places(BitSetMethods.java, HeapMemoryAllocator.java, LongArray.java) that use it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the updated checkValue of spark.shuffle.spill.diskWriteBufferSize already guarantee this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can guarantee this.
Here explains why it must be greater than 12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If just for explaining it, maybe you can put a comment on where diskWriteBufferSize is defined, instead of an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the best place of this comment? I am neutral on this.
a0d36c7 to
6f8404b
Compare
|
Test build #97887 has finished for PR 22754 at commit
|
|
Test build #97903 has finished for PR 22754 at commit
|
|
You can remove [MINOR] from the title since there is a JIRA ticket now. |
|
retest this please. |
|
Test build #97916 has finished for PR 22754 at commit
|
|
Test build #97917 has finished for PR 22754 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retest this please
|
Test build #97951 has finished for PR 22754 at commit
|
|
Test build #97996 has finished for PR 22754 at commit
|
b2ca621 to
c97906c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For a multiple-line comment, a starting line /** does not have a text.
c97906c to
c883f4b
Compare
|
Test build #98157 has finished for PR 22754 at commit
|
|
Test build #98165 has finished for PR 22754 at commit
|
|
LGTM, pending Jenkins |
|
retest this please |
|
Test build #98434 has finished for PR 22754 at commit
|
|
Thanks! merging to master |
## What changes were proposed in this pull request? In `UnsafeSorterSpillWriter.java`, when we write a record to a spill file wtih ` void write(Object baseObject, long baseOffset, int recordLength, long keyPrefix)`, `recordLength` and `keyPrefix` will be written the disk write buffer first, and these will take 12 bytes, so the disk write buffer size must be greater than 12. If `diskWriteBufferSize` is 10, it will print this exception info: _java.lang.ArrayIndexOutOfBoundsException: 10 at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91) at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222) at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)_ ## How was this patch tested? Existing UT in `UnsafeExternalSorterSuite` Closes apache#22754 from 10110346/diskWriteBufferSize. Authored-by: liuxian <[email protected]> Signed-off-by: Kazuaki Ishizaki <[email protected]>
|
Hi, @kiszk . |
…han 12 apache#22754 In UnsafeSorterSpillWriter.java, when we write a record to a spill file wtih void write(Object baseObject, long baseOffset, int recordLength, long keyPrefix), recordLength and keyPrefix will be written the disk write buffer first, and these will take 12 bytes, so the disk write buffer size must be greater than 12. If diskWriteBufferSize is 10, it will print this exception info: java.lang.ArrayIndexOutOfBoundsException: 10 at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91) at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222) at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)
Ref: LIHADOOP-42707 In `UnsafeSorterSpillWriter.java`, when we write a record to a spill file wtih ` void write(Object baseObject, long baseOffset, int recordLength, long keyPrefix)`, `recordLength` and `keyPrefix` will be written the disk write buffer first, and these will take 12 bytes, so the disk write buffer size must be greater than 12. If `diskWriteBufferSize` is 10, it will print this exception info: _java.lang.ArrayIndexOutOfBoundsException: 10 at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91) at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222) at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)_ Existing UT in `UnsafeExternalSorterSuite` Closes apache#22754 from 10110346/diskWriteBufferSize. Authored-by: liuxian <[email protected]> Signed-off-by: Kazuaki Ishizaki <[email protected]> (cherry picked from commit 6c9e5ac) RB=1518191 BUG=LIHADOOP-42707 G=superfriends-reviewers R=fli,mshen,yezhou,edlu A=fli
What changes were proposed in this pull request?
In
UnsafeSorterSpillWriter.java, when we write a record to a spill file wtihvoid write(Object baseObject, long baseOffset, int recordLength, long keyPrefix),recordLengthandkeyPrefixwill be written the disk write buffer first, and these will take 12 bytes, so the disk write buffer size must be greater than 12.If
diskWriteBufferSizeis 10, it will print this exception info:java.lang.ArrayIndexOutOfBoundsException: 10
at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91)
at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222)
at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)
How was this patch tested?
Existing UT in
UnsafeExternalSorterSuite