-
Notifications
You must be signed in to change notification settings - Fork 3k
Doc: Add values supported by parquet\avro compression codec #3892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
write.parquet.compression-codecwrite.parquet.compression-codec
site/docs/configuration.md
Outdated
| | write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size | | ||
| | write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size | | ||
| | write.parquet.compression-codec | gzip | Parquet compression codec | | ||
| | write.parquet.compression-codec | gzip | Parquet compression codec; uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. Note that `zstd` requires `ZStandardCodec` to be installed before Hadoop 2.9.0, `brotli` requires `BrotliCodec` to be installed. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be a link to Parquet docs instead?
This doesn't fit in a table, so we will need to move it or shorten it. I think making it a link fixes both problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
site/docs/configuration.md
Outdated
| | write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size | | ||
| | write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size | | ||
| | write.parquet.compression-codec | gzip | Parquet compression codec | | ||
| | write.parquet.compression-codec | gzip | Parquet compression codec; uncompressed, snappy, gzip, etc. For more options: [CompressionCodecName](https://github.com/apache/parquet-mr/blob/parquet-1.12.x/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding significantly more text to a table, let's make "compression codec" the link text. Also, this can simply list the other codec names and still be smaller than what is here. Including "etc" takes nearly as much space as "zstd" so let's just list these: zstd, brotli, lz4, gzip, snappy, uncompressed.
454b77d to
17a0ad3
Compare
write.parquet.compression-codec17a0ad3 to
7d0edd9
Compare
|
Thanks, @hililiwei! Good to have those listed. |
* apache/iceberg#3723 * apache/iceberg#3732 * apache/iceberg#3749 * apache/iceberg#3766 * apache/iceberg#3787 * apache/iceberg#3796 * apache/iceberg#3809 * apache/iceberg#3820 * apache/iceberg#3878 * apache/iceberg#3890 * apache/iceberg#3892 * apache/iceberg#3944 * apache/iceberg#3976 * apache/iceberg#3993 * apache/iceberg#3996 * apache/iceberg#4008 * apache/iceberg#3758 and 3856 * apache/iceberg#3761 * apache/iceberg#2062 * apache/iceberg#3422 * remove restriction related to legacy parquet file list
List the values supported by
write.parquet.compression-codecin desc:uncompressed, snappy, gzip, lzo, brotli, lz4, zstd