Skip to content

Conversation

@10110346
Copy link
Contributor

@10110346 10110346 commented Oct 17, 2018

What changes were proposed in this pull request?

In UnsafeSorterSpillWriter.java, when we write a record to a spill file wtih void write(Object baseObject, long baseOffset, int recordLength, long keyPrefix), recordLength and keyPrefix will be written the disk write buffer first, and these will take 12 bytes, so the disk write buffer size must be greater than 12.

If diskWriteBufferSize is 10, it will print this exception info:

java.lang.ArrayIndexOutOfBoundsException: 10
at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91)
at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222)
at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)

How was this patch tested?

Existing UT in UnsafeExternalSorterSuite

@10110346 10110346 changed the title [CORE][MINOR]The disk write buffer size must be greater than 12 [MINOR][CORE]The disk write buffer size must be greater than 12 Oct 17, 2018
@SparkQA
Copy link

SparkQA commented Oct 17, 2018

Test build #97484 has finished for PR 22754 at commit dfc4b65.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Oct 18, 2018

Good catch. My suggestion is to create a JIRA entry.
One question: can we set 12 into this property?

@10110346
Copy link
Contributor Author

If we set 12 into this, freeSpaceInWriteBuffer will be 0, and the length of copyMemory will always be 0, so it is not allowed to set 12 into this property.
https://github.com/apache/spark/blob/dfc4b65088602ee45e0babe22e64e205fab3e82b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java#L128

@10110346
Copy link
Contributor Author

@kiszk Thanks,I will create a JIRA.

@10110346 10110346 changed the title [MINOR][CORE]The disk write buffer size must be greater than 12 [SPARK-25776][CORE][MINOR]The disk write buffer size must be greater than 12 Oct 19, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refine this comment to explain more than 12 bytes are required?
For example, space used by prefix + len + recordLength is more than 4 + 8 bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,thanks

@kiszk
Copy link
Member

kiszk commented Oct 21, 2018

Thank you for your clarification.

@10110346
Copy link
Contributor Author

Thank you for your review, I will update it @kiszk

@10110346 10110346 force-pushed the diskWriteBufferSize branch 2 times, most recently from 59bf755 to a0d36c7 Compare October 22, 2018 02:05
@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97689 has started for PR 22754 at commit a0d36c7.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97693 has finished for PR 22754 at commit a0d36c7.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97696 has finished for PR 22754 at commit dfc4b65.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97758 has finished for PR 22754 at commit a0d36c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97763 has finished for PR 22754 at commit dfc4b65.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97776 has started for PR 22754 at commit dfc4b65.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97770 has started for PR 22754 at commit a0d36c7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert shouldn't be used to check arguments in public APIs. But, despite its visibility I'm not sure if this is really a public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure too, but I see many places(BitSetMethods.java, HeapMemoryAllocator.java, LongArray.java) that use it like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the updated checkValue of spark.shuffle.spill.diskWriteBufferSize already guarantee this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can guarantee this.
Here explains why it must be greater than 12.

Copy link
Member

@viirya viirya Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If just for explaining it, maybe you can put a comment on where diskWriteBufferSize is defined, instead of an assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the best place of this comment? I am neutral on this.

@10110346 10110346 force-pushed the diskWriteBufferSize branch from a0d36c7 to 6f8404b Compare October 23, 2018 01:24
@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97887 has finished for PR 22754 at commit 6f8404b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97903 has finished for PR 22754 at commit 6f8404b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Oct 23, 2018

You can remove [MINOR] from the title since there is a JIRA ticket now.

@viirya
Copy link
Member

viirya commented Oct 23, 2018

retest this please.

@10110346 10110346 changed the title [SPARK-25776][CORE][MINOR]The disk write buffer size must be greater than 12 [SPARK-25776][CORE]The disk write buffer size must be greater than 12 Oct 23, 2018
@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97916 has finished for PR 22754 at commit 6f8404b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97917 has finished for PR 22754 at commit 6f8404b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

@10110346 10110346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retest this please

@SparkQA
Copy link

SparkQA commented Oct 24, 2018

Test build #97951 has finished for PR 22754 at commit 6f8404b.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #97996 has finished for PR 22754 at commit b2ca621.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346 10110346 force-pushed the diskWriteBufferSize branch from b2ca621 to c97906c Compare October 28, 2018 07:24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For a multiple-line comment, a starting line /** does not have a text.

@10110346 10110346 force-pushed the diskWriteBufferSize branch from c97906c to c883f4b Compare October 28, 2018 08:48
@SparkQA
Copy link

SparkQA commented Oct 28, 2018

Test build #98157 has finished for PR 22754 at commit c97906c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 28, 2018

Test build #98165 has finished for PR 22754 at commit c883f4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Nov 4, 2018

LGTM, pending Jenkins

@kiszk
Copy link
Member

kiszk commented Nov 4, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Nov 4, 2018

Test build #98434 has finished for PR 22754 at commit c883f4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 6c9e5ac Nov 4, 2018
@kiszk
Copy link
Member

kiszk commented Nov 4, 2018

Thanks! merging to master

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

 In `UnsafeSorterSpillWriter.java`, when we write a record to a spill file wtih ` void write(Object baseObject, long baseOffset,  int recordLength, long keyPrefix)`, `recordLength` and `keyPrefix`  will be  written  the disk write buffer  first, and these will take 12 bytes, so the disk write buffer size must be greater than 12.

 If `diskWriteBufferSize` is  10, it will print this exception info:

_java.lang.ArrayIndexOutOfBoundsException: 10
   at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222)
	at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)_

## How was this patch tested?
Existing UT in `UnsafeExternalSorterSuite`

Closes apache#22754 from 10110346/diskWriteBufferSize.

Authored-by: liuxian <[email protected]>
Signed-off-by: Kazuaki Ishizaki <[email protected]>
@dongjoon-hyun
Copy link
Member

Hi, @kiszk .
Can we have this in branch-2.4 too since Spark 2.4.x is the last version of 2.x line which we should support for a while?

zzcclp added a commit to zzcclp/spark that referenced this pull request Sep 20, 2019
…han 12 apache#22754

In UnsafeSorterSpillWriter.java, when we write a record to a spill file wtih void write(Object baseObject, long baseOffset, int recordLength, long keyPrefix), recordLength and keyPrefix will be written the disk write buffer first, and these will take 12 bytes, so the disk write buffer size must be greater than 12.

If diskWriteBufferSize is 10, it will print this exception info:

java.lang.ArrayIndexOutOfBoundsException: 10
at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91)
at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222)
at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
Ref: LIHADOOP-42707

 In `UnsafeSorterSpillWriter.java`, when we write a record to a spill file wtih ` void write(Object baseObject, long baseOffset,  int recordLength, long keyPrefix)`, `recordLength` and `keyPrefix`  will be  written  the disk write buffer  first, and these will take 12 bytes, so the disk write buffer size must be greater than 12.

 If `diskWriteBufferSize` is  10, it will print this exception info:

_java.lang.ArrayIndexOutOfBoundsException: 10
   at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.writeLongToBuffer (UnsafeSorterSpillWriter.java:91)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:123)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spillIterator(UnsafeExternalSorter.java:498)
	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:222)
	at org.apache.spark.memory.MemoryConsumer.spill(MemoryConsumer.java:65)_

Existing UT in `UnsafeExternalSorterSuite`

Closes apache#22754 from 10110346/diskWriteBufferSize.

Authored-by: liuxian <[email protected]>
Signed-off-by: Kazuaki Ishizaki <[email protected]>
(cherry picked from commit 6c9e5ac)

RB=1518191
BUG=LIHADOOP-42707
G=superfriends-reviewers
R=fli,mshen,yezhou,edlu
A=fli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants