Skip to content

Conversation

@hililiwei
Copy link
Contributor

List the values supported by write.parquet.compression-codec in desc: uncompressed, snappy, gzip, lzo, brotli, lz4, zstd

@github-actions github-actions bot added the docs label Jan 13, 2022
@hililiwei hililiwei changed the title Doc: Add enumeration supported by write.parquet.compression-codec Docs: Add enumeration supported by write.parquet.compression-codec Jan 13, 2022
| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size |
| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size |
| write.parquet.compression-codec | gzip | Parquet compression codec |
| write.parquet.compression-codec | gzip | Parquet compression codec; uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires `BrotliCodec` to be installed. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be a link to Parquet docs instead?

This doesn't fit in a table, so we will need to move it or shorten it. I think making it a link fixes both problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@hililiwei hililiwei requested a review from rdblue January 18, 2022 13:04
| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size |
| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size |
| write.parquet.compression-codec | gzip | Parquet compression codec |
| write.parquet.compression-codec | gzip | Parquet compression codec; uncompressed, snappy, gzip, etc. For more options: [CompressionCodecName](https://github.com/apache/parquet-mr/blob/parquet-1.12.x/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding significantly more text to a table, let's make "compression codec" the link text. Also, this can simply list the other codec names and still be smaller than what is here. Including "etc" takes nearly as much space as "zstd" so let's just list these: zstd, brotli, lz4, gzip, snappy, uncompressed.

@hililiwei hililiwei changed the title Docs: Add enumeration supported by write.parquet.compression-codec Doc: Add values supported by parquet\avro compression codec Jan 19, 2022
@hililiwei hililiwei marked this pull request as draft January 19, 2022 05:53
@hililiwei hililiwei marked this pull request as ready for review January 19, 2022 05:56
@hililiwei hililiwei requested a review from rdblue January 19, 2022 10:22
@rdblue rdblue merged commit 0c80aab into apache:master Jan 19, 2022
@rdblue
Copy link
Contributor

rdblue commented Jan 19, 2022

Thanks, @hililiwei! Good to have those listed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants