Conversation
| stopOnEmptyBlock = true; | ||
| } | ||
|
|
||
| void setStopOnEmptyBlock(boolean stop) { |
There was a problem hiding this comment.
Shouldn't this be a public method?
| public int available() throws IOException { | ||
| if (o == originalLen) { | ||
| refill(); | ||
| } |
There was a problem hiding this comment.
I think typically available() just returns the remaining bytes in the buffer, and it should not call a blocking method like refill(), though it is not strictly prohibited. Do you have any reason you want to call refill() in available()?
There was a problem hiding this comment.
I did not remember the the reason, maybe it's OK to not call refill() here.
| } catch (EOFException e) { | ||
| finished = true; | ||
| return; | ||
| } |
There was a problem hiding this comment.
You should re-throw EOFException if stopOnEmptyBlock is true.
|
@davies If this feature is really needed in Spark, I'll merge it for the next release, but could you comment on my questions before that? Thanks. |
|
@odaira I do not have bandwidth to work on this patch, could you fork this one? |
|
Thanks! |
In Apache Spark, we have a fast path to merge two compressed shuffle files together by concat the bytes directly (without decompress and compress again), this rely on that the decompressor to handle concated streams.
The BlockStream uses an empty block as the end of a stream, this PR change to skip the empty block until the end of stream. (also added a flag for this behavior change, having old behavior by default).