Add support for JNI based decompression for zstd files by gauravkm · Pull Request #13704 · prestodb/presto

gauravkm · 2019-11-14T22:46:49Z

Initial results and estimates point to 10% improvement in CPU by using JNI.

The feature is controlled by a flag, that is false by default.

I tried to split the pull requests but there is no way to stack them across different forks.

highker

Coding style comments
Could you remove binary files like sample_zstd? Just randomly generate them with as temporary files and read them back

highker · 2019-11-15T06:40:54Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkZstdJniDecompression.java

Could you follow our developer guideline (https://github.com/prestodb/presto/wiki/Presto-Development-Guidelines#formatting) and clean up this patch including:

Spell out every word (e.g., src -> source)

Provide meaningful names to variables (e.g., i -> length)

We don't use ArrayList, LinkedList, and other Java native containers unless necessary (e.g., mutability or nulls). Instead, we use Guava ImmutableSet/Map/List

highker · 2019-11-15T06:41:07Z

presto-orc/src/test/java/com/facebook/presto/orc/TestJniZstdDecompression.java

Let's use DataSize type

highker · 2019-11-15T06:42:36Z

presto-orc/src/test/java/com/facebook/presto/orc/TestJniZstdDecompression.java

Also, we enforce one parameter per line

gauravkm · 2019-11-16T00:47:26Z

Could you remove binary files like sample_zstd? Just randomly generate them with as temporary files and read them back

This are really ORC files. Not possible to generate them randomly. I can rename them to say sample_orc

mbasmanova · 2019-11-16T02:54:25Z

@gauravkm

This are really ORC files. Not possible to generate them randomly. I can rename them to say sample_orc

Gaurav, you can generate ORC files on the fly. See com.facebook.presto.orc.OrcTester#writeOrcColumnHive

highker

some comments

highker · 2019-11-19T04:37:04Z

presto-main/src/main/java/com/facebook/presto/transaction/InMemoryTransactionManager.java

Unrelated change?

Will put a separate diff if required. For some reason, this would not allow Idea to compile the code.

highker · 2019-11-19T04:37:24Z

presto-orc/src/main/java/com/facebook/presto/orc/OrcDecompressor.java

else { is redundant

highker · 2019-11-19T04:37:47Z

presto-orc/src/main/java/com/facebook/presto/orc/zstd/ZstdJniDecompressor.java

spell out dst

highker · 2019-11-19T04:37:51Z

presto-orc/src/main/java/com/facebook/presto/orc/zstd/ZstdJniDecompressor.java

remove this.

highker · 2019-11-19T04:37:59Z

presto-orc/src/main/java/com/facebook/presto/orc/zstd/ZstdJniDecompressor.java

break a new line after }

highker · 2019-11-19T04:39:16Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkBatchStreamReadersWithZstd.java

highker · 2019-11-19T04:39:55Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkZstdJniDecompression.java

spell out src

highker · 2019-11-19T04:40:26Z

presto-orc/src/test/java/com/facebook/presto/orc/TestZstdJniDecompression.java

one param per line

highker · 2019-11-19T04:46:04Z

presto-orc/src/main/java/com/facebook/presto/orc/zstd/ZstdJniDecompressor.java

Do we wanna test on Zstd.isError(size)?

Yes, we should I missed this part. I am working on making this change. Thanks for looking at it again. Will make the other changes as well.

highker · 2019-11-19T04:50:03Z

presto-orc/src/main/java/com/facebook/presto/orc/zstd/ZstdJniDecompressor.java

This reuses aircompressor. So two options maybe:

Share the same codepath with OrcZstdDecompressor and pass in the jni flag to decide which decompressor (jni or airlift) to use within OrcZstdDecompressor, or

Keep this class and remove aircompressor dependency.

highker

minor comments; otherwise lgtm

highker · 2019-11-19T06:24:01Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkBatchStreamReadersWithZstd.java

nit: we don't usually use final on method. (Also, as a side note: we don't use final for tmp variables)

highker · 2019-11-19T06:24:07Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkBatchStreamReadersWithZstd.java

same, remove final

highker · 2019-11-19T06:25:06Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkZstdJniDecompression.java

highker · 2019-11-19T06:25:29Z

presto-orc/src/test/java/com/facebook/presto/orc/BenchmarkZstdJniDecompression.java

put final to all of them

highker · 2019-11-19T06:27:14Z

presto-orc/src/test/java/com/facebook/presto/orc/OrcReaderTestingUtils.java

Maybe just call it createTestingReaderOptions

highker · 2019-11-19T06:27:49Z

presto-orc/src/test/java/com/facebook/presto/orc/OrcReaderTestingUtils.java

Also call this function createTestingReaderOptions and it should delegate to the one below with zstdJniDecompressionEnabled = false.

highker · 2019-11-19T06:28:50Z

presto-orc/src/test/java/com/facebook/presto/orc/OrcTester.java

move this closer to the other writeOrcColumnPresto

highker · 2019-11-19T06:30:26Z

presto-orc/src/test/java/com/facebook/presto/orc/TestZstdJniDecompression.java

Shall we assert on the data equal. Size equality seems a bit weak.

highker · 2019-11-19T06:33:41Z

presto-orc/src/main/java/com/facebook/presto/orc/OrcZstdDecompressor.java

nit

this.decompressor = (input, inputOffset, inputLength, output, outputOffset, maxOutputLength) -> { long size = Zstd.decompressByteArray(output, 0, maxOutputLength, input, inputOffset, inputLength); if (Zstd.isError(size)) { throw new RuntimeException(Zstd.getErrorName(size)); } return toIntExact(size); };

I am not quite sure if we should throw RuntimeException or MalformedInputException.

Currently decompress catches the MalformInputException (which is a runtime exception) and rethrows it as OrcCorruptException which is a checked exception.

If we throw Runtime exception, then this behavior changes.

highker · 2019-11-19T06:42:12Z

presto-orc/src/main/java/com/facebook/presto/orc/OrcDecompressor.java

check state here zstdJniDecompressionEnabled is true only if compression is ZSTD

Actually no; no need to check state

highker

well done!

gauravkm requested a review from highker November 14, 2019 22:47

highker reviewed Nov 15, 2019

View reviewed changes

gauravkm assigned highker and unassigned highker Nov 18, 2019

highker mentioned this pull request Nov 18, 2019

Wrap OrcReader parameters into OrcReaderOptions #13697

Merged

gauravkm requested a review from highker November 19, 2019 01:27

highker reviewed Nov 19, 2019

View reviewed changes

Add JNI decompressor for zstd files

a5d364a

highker approved these changes Nov 19, 2019

View reviewed changes

highker merged commit 3d2f2a1 into prestodb:master Nov 19, 2019

gauravkm deleted the zstd branch November 20, 2019 02:25

leiqingc mentioned this pull request Dec 10, 2019

[DO NOT MERGE] Add release notes for 0.230 #13836

Closed

9 tasks

This was referenced Dec 10, 2019

[DO NOT MERGE] Add release notes for 0.230 #13837

Closed

[DO NOT MERGE] Add release notes for 0.230 #13840

Closed

caithagoras mentioned this pull request Dec 18, 2019

Add release notes for 0.230 #13876

Closed

14 tasks

This was referenced Jan 2, 2020

Add release notes for 0.230 #13913

Closed

Add release notes for 0.230 #13915

Merged

Conversation

gauravkm commented Nov 14, 2019

Uh oh!

highker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gauravkm commented Nov 16, 2019

Uh oh!

mbasmanova commented Nov 16, 2019

Uh oh!

highker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

highker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

highker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone