Skip to content

Add support for JNI based decompression for zstd files#13704

Merged
highker merged 1 commit intoprestodb:masterfrom
gauravkm:zstd
Nov 19, 2019
Merged

Add support for JNI based decompression for zstd files#13704
highker merged 1 commit intoprestodb:masterfrom
gauravkm:zstd

Conversation

@gauravkm
Copy link
Contributor

Initial results and estimates point to 10% improvement in CPU by using JNI.

The feature is controlled by a flag, that is false by default.

I tried to split the pull requests but there is no way to stack them across different forks.

@gauravkm gauravkm requested a review from highker November 14, 2019 22:47
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Coding style comments
  • Could you remove binary files like sample_zstd? Just randomly generate them with as temporary files and read them back

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow our developer guideline (https://github.com/prestodb/presto/wiki/Presto-Development-Guidelines#formatting) and clean up this patch including:

  • Spell out every word (e.g., src -> source)
  • Provide meaningful names to variables (e.g., i -> length)
  • We don't use ArrayList, LinkedList, and other Java native containers unless necessary (e.g., mutability or nulls). Instead, we use Guava ImmutableSet/Map/List

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use DataSize type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(y)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we enforce one parameter per line

@gauravkm
Copy link
Contributor Author

Could you remove binary files like sample_zstd? Just randomly generate them with as temporary files and read them back

This are really ORC files. Not possible to generate them randomly. I can rename them to say sample_orc

@mbasmanova
Copy link
Contributor

@gauravkm

This are really ORC files. Not possible to generate them randomly. I can rename them to say sample_orc

Gaurav, you can generate ORC files on the fly. See com.facebook.presto.orc.OrcTester#writeOrcColumnHive

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will put a separate diff if required. For some reason, this would not allow Idea to compile the code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else { is redundant

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell out dst

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break a new line after }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell out src

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one param per line

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wanna test on Zstd.isError(size)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should I missed this part. I am working on making this change. Thanks for looking at it again. Will make the other changes as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reuses aircompressor. So two options maybe:

  1. Share the same codepath with OrcZstdDecompressor and pass in the jni flag to decide which decompressor (jni or airlift) to use within OrcZstdDecompressor, or
  2. Keep this class and remove aircompressor dependency.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments; otherwise lgtm

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't usually use final on method. (Also, as a side note: we don't use final for tmp variables)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, remove final

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put final to all of them

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call it createTestingReaderOptions

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also call this function createTestingReaderOptions and it should delegate to the one below with zstdJniDecompressionEnabled = false.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this closer to the other writeOrcColumnPresto

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we assert on the data equal. Size equality seems a bit weak.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

this.decompressor = (input, inputOffset, inputLength, output, outputOffset, maxOutputLength) -> {
    long size = Zstd.decompressByteArray(output, 0, maxOutputLength, input, inputOffset, inputLength);
    if (Zstd.isError(size)) {
        throw new RuntimeException(Zstd.getErrorName(size));
    }
    return toIntExact(size);
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure if we should throw RuntimeException or MalformedInputException.

Currently decompress catches the MalformInputException (which is a runtime exception) and rethrows it as OrcCorruptException which is a checked exception.

If we throw Runtime exception, then this behavior changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check state here zstdJniDecompressionEnabled is true only if compression is ZSTD

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no; no need to check state

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants