Skip to content
This repository was archived by the owner on Dec 2, 2025. It is now read-only.

Comments

Read concatenated byte stream in LZ4BlockInputStream#105

Merged
odaira merged 4 commits intolz4:masterfrom
maropu:ConcatByteStream
Jun 30, 2017
Merged

Read concatenated byte stream in LZ4BlockInputStream#105
odaira merged 4 commits intolz4:masterfrom
maropu:ConcatByteStream

Conversation

@maropu
Copy link
Contributor

@maropu maropu commented Jun 26, 2017

This pr fixed issues when reading concatenated byte stream in LZ4BlockInputStream.
This comes from #76.

@odaira
Copy link
Member

odaira commented Jun 27, 2017

Thanks for your contribution!
Doesn't this change the default behavior when reading an empty block? Would you have any good reason for the change? As you might know, #76 kept the default behavior and added setStopOnEmptyBlock() to control the behavior.

@maropu
Copy link
Contributor Author

maropu commented Jun 28, 2017

ok, I'll update soon.

@maropu
Copy link
Contributor Author

maropu commented Jun 28, 2017

ok, fixed. check again? Thanks!

stopOnEmptyBlock = true;
}

protected void setStopOnEmptyBlock(boolean stopOnEmptyBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be protected, or public? I guess the use case you assume is to extend this class to enable the new behavior, but isn't it simpler to just make setStopOnEmptyBlock() public so that users can just call it if they want the new behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reconsider this and I think how about setting this in the constructor? I couldn't imagine users change this flag during the decompression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It makes more sense to have new constructors.


private void refill() throws IOException {
readFully(compressedBuffer, HEADER_LENGTH);
if (finished || o < originalLen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check necessary? I don't find any execution path on which this check is evaluated to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. my bad. I forgot removing this. Thanks.

* @param stopOnEmptyBlock whether read is stopped on an empty block
*/
public LZ4BlockInputStream(InputStream in, LZ4FastDecompressor decompressor, Checksum checksum) {
public LZ4BlockInputStream(InputStream in, LZ4FastDecompressor decompressor, Checksum checksum, Boolean stopOnEmptyBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so familiar with the latest Java best practices, but is there any reason for Boolean, not boolean?


/**
* Create a new instance using {@link XXHash32} for checksuming.
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


/**
* Create a new instance using {@link XXHash32} for checksuming.
* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* @see #LZ4BlockInputStream(InputStream, LZ4FastDecompressor, Checksum, Boolean)
* @see StreamingXXHash32#asChecksum()
*/
public LZ4BlockInputStream(InputStream in, Boolean stopOnEmptyBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@odaira odaira merged commit 6480bd9 into lz4:master Jun 30, 2017
@odaira
Copy link
Member

odaira commented Jun 30, 2017

Thanks a lot for your contribution!

@maropu
Copy link
Contributor Author

maropu commented Jul 1, 2017

Thanks! Do we have a new release in near future? Any blocker?

asfgit pushed a commit to apache/spark that referenced this pull request Aug 9, 2017
## What changes were proposed in this pull request?
This pr updated `lz4-java` to the latest (v1.4.0) and removed custom `LZ4BlockInputStream`. We currently use custom `LZ4BlockInputStream` to read concatenated byte stream in shuffle. But, this functionality has been implemented in the latest lz4-java (lz4/lz4-java#105). So, we might update the latest to remove the custom `LZ4BlockInputStream`.

Major diffs between the latest release and v1.3.0 in the master are as follows (lz4/lz4-java@62f7547...6d4693f);
- fixed NPE in XXHashFactory similarly
- Don't place resources in default package to support shading
- Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
- Try to load lz4-java from java.library.path, then fallback to bundled
- Add ppc64le binary
- Add s390x JNI binding
- Add basic LZ4 Frame v1.5.0 support
- enable aarch64 support for lz4-java
- Allow unsafeInstance() for ppc64le archiecture
- Add unsafeInstance support for AArch64
- Support 64-bit JNI build on Solaris
- Avoid over-allocating a buffer
- Allow EndMark to be incompressible for LZ4FrameInputStream.
- Concat byte stream

## How was this patch tested?
Existing tests.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #18883 from maropu/SPARK-21276.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants