Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Jan 2, 2020

This bug manifested itself when another stream would potentially make a call
to NioBufferedFileInputStream.read() after it had reached EOF in the wrapped
stream. In that case, the refill() code would clear the output buffer the
first time EOF was found, leaving it in a readable state for subsequent
read() calls. If any of those calls were made, bad data would be returned.

By flipping the buffer before returning, even in the EOF case, you get the
correct behavior in subsequent calls. I picked that approach to avoid keeping
more state in this class, although it means calling the underlying stream
even after EOF (which is fine, but perhaps a little more expensive).

This showed up (at least) when using encryption, because the commons-crypto
StreamInput class does not track EOF internally, leaving it for the wrapped
stream to behave correctly.

Tested with added unit test + slightly modified test case attached to SPARK-18105.

…leInputStream.

This bug manifested itself when another stream would potentially make a call
to NioBufferedFileInputStream.read() after it had reached EOF in the wrapped
stream. In that case, the refill() code would clear the output buffer the
first time EOF was found, leaving it in a readable state for subsequent
read() calls. If any of those calls were made, bad data would be returned.

By flipping the buffer before returning, even in the EOF case, you get the
correct behavior in subsequent calls. I picked that approach to avoid keeping
more state in this class, although it means calling the underlying stream
even after EOF (which is fine, but perhaps a little more expensive).

This showed up (at least) when using encryption, because the commons-crypto
StreamInput class does not track EOF internally, leaving it for the wrapped
stream to behave correctly.

Tested with added unit test + slightly modified test case attached to SPARK-18105.
@SparkQA
Copy link

SparkQA commented Jan 2, 2020

Test build #116056 has finished for PR 27084 at commit 0489fca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 2, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jan 3, 2020

Test build #116061 has finished for PR 27084 at commit 0489fca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (!byteBuffer.hasRemaining()) {
byteBuffer.clear();
int nRead = 0;
while (nRead == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the comment says it checks the weather :-).

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Member

Merged to master and branch-2.4.

HyukjinKwon pushed a commit that referenced this pull request Jan 3, 2020
…leInputStream

This bug manifested itself when another stream would potentially make a call
to NioBufferedFileInputStream.read() after it had reached EOF in the wrapped
stream. In that case, the refill() code would clear the output buffer the
first time EOF was found, leaving it in a readable state for subsequent
read() calls. If any of those calls were made, bad data would be returned.

By flipping the buffer before returning, even in the EOF case, you get the
correct behavior in subsequent calls. I picked that approach to avoid keeping
more state in this class, although it means calling the underlying stream
even after EOF (which is fine, but perhaps a little more expensive).

This showed up (at least) when using encryption, because the commons-crypto
StreamInput class does not track EOF internally, leaving it for the wrapped
stream to behave correctly.

Tested with added unit test + slightly modified test case attached to SPARK-18105.

Closes #27084 from vanzin/SPARK-30225.

Authored-by: Marcelo Vanzin <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@vanzin vanzin deleted the SPARK-30225 branch January 7, 2020 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants