Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 1, 2021

SPARK-34479 added ZSTD support in Avro data source.

ZSTD is a highly tunable compression codec. This PR aims to support additional ZSTD options.

  • avro.mapred.zstd.level
  • avro.mapred.zstd.bufferpool

ZSTD JNI bufferpool is a new feature and is supported by SPARK-34340/PARQUET-1973, too.

For the benchmark result of new buffer pool management, please see Apache Spark ZStandardBenchmark result.

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

This PR aims to support additional ZSTD options.
- avro.mapred.zstd.level
- avro.mapred.zstd.bufferpool
@github-actions github-actions bot added the Java Pull Requests for Java binding label Mar 1, 2021
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Mar 1, 2021

cc @iemejia , @Fokko , @wangyum

I hope we can give the controllability to the Spark users.

Copy link
Member

@iemejia iemejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks @dongjoon-hyun, nice to see both the BufferPool and Level options now available in Avro too!

@iemejia iemejia merged commit cff3bda into apache:master Mar 1, 2021
@dongjoon-hyun
Copy link
Member Author

Thank you so much, @iemejia and @wangyum !

@dongjoon-hyun dongjoon-hyun deleted the AVRO-3060 branch March 2, 2021 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java Pull Requests for Java binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants