Add support for JNI based decompression for zstd files#13704
Add support for JNI based decompression for zstd files#13704highker merged 1 commit intoprestodb:masterfrom gauravkm:zstd
Conversation
highker
left a comment
There was a problem hiding this comment.
- Coding style comments
- Could you remove binary files like
sample_zstd? Just randomly generate them with as temporary files and read them back
There was a problem hiding this comment.
Could you follow our developer guideline (https://github.com/prestodb/presto/wiki/Presto-Development-Guidelines#formatting) and clean up this patch including:
- Spell out every word (e.g., src -> source)
- Provide meaningful names to variables (e.g., i -> length)
- We don't use ArrayList, LinkedList, and other Java native containers unless necessary (e.g., mutability or nulls). Instead, we use Guava ImmutableSet/Map/List
There was a problem hiding this comment.
Also, we enforce one parameter per line
This are really ORC files. Not possible to generate them randomly. I can rename them to say sample_orc |
Gaurav, you can generate ORC files on the fly. See com.facebook.presto.orc.OrcTester#writeOrcColumnHive |
There was a problem hiding this comment.
Will put a separate diff if required. For some reason, this would not allow Idea to compile the code.
There was a problem hiding this comment.
Do we wanna test on Zstd.isError(size)?
There was a problem hiding this comment.
Yes, we should I missed this part. I am working on making this change. Thanks for looking at it again. Will make the other changes as well.
There was a problem hiding this comment.
This reuses aircompressor. So two options maybe:
- Share the same codepath with
OrcZstdDecompressorand pass in the jni flag to decide which decompressor (jni or airlift) to use withinOrcZstdDecompressor, or - Keep this class and remove aircompressor dependency.
There was a problem hiding this comment.
nit: we don't usually use final on method. (Also, as a side note: we don't use final for tmp variables)
There was a problem hiding this comment.
Maybe just call it createTestingReaderOptions
There was a problem hiding this comment.
Also call this function createTestingReaderOptions and it should delegate to the one below with zstdJniDecompressionEnabled = false.
There was a problem hiding this comment.
move this closer to the other writeOrcColumnPresto
There was a problem hiding this comment.
Shall we assert on the data equal. Size equality seems a bit weak.
There was a problem hiding this comment.
nit
this.decompressor = (input, inputOffset, inputLength, output, outputOffset, maxOutputLength) -> {
long size = Zstd.decompressByteArray(output, 0, maxOutputLength, input, inputOffset, inputLength);
if (Zstd.isError(size)) {
throw new RuntimeException(Zstd.getErrorName(size));
}
return toIntExact(size);
};There was a problem hiding this comment.
I am not quite sure if we should throw RuntimeException or MalformedInputException.
Currently decompress catches the MalformInputException (which is a runtime exception) and rethrows it as OrcCorruptException which is a checked exception.
If we throw Runtime exception, then this behavior changes.
There was a problem hiding this comment.
check state here zstdJniDecompressionEnabled is true only if compression is ZSTD
Initial results and estimates point to 10% improvement in CPU by using JNI.
The feature is controlled by a flag, that is false by default.
I tried to split the pull requests but there is no way to stack them across different forks.