-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Support setting compression_codec table property for Iceberg #25755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support setting compression_codec table property for Iceberg #25755
Conversation
36c996a to
b5e8d55
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to the iceberg parquet compression handling bug? If so, we'll want an inline comment explaining what's going on here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what is the bug being referring to here ? I responded at #24851 (comment) to the concern raised there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we let Iceberg write the table property write.parquet.compression-codec to zstd by default for us, then this PR will introduce some non-backward-compatible behavior. Suppose we have a user that didn't set write.parquet.compression-codec in their table property, if they are setting write.parquet.compression-codec to snappy in their iceberg config/iceberg session property, when they write a new file, on a previous version of Trino, the compression-codec being used would be snappy. But because this change will try to also read compression-codec from table properties when writing a file, this time the compression-codec being used would be zstd instead of snappy, which is not backward compatible. So we want to disable Iceberg from setting this table property if that's not being set by user.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for specifying the write_compression and compression_level table properties for the Iceberg connector and updates related tests and compression property validations.
- Adds new DDL support and property validations for write_compression and compression_level.
- Updates tests to verify new expected compression property values.
- Adjusts compression property handling across table creation, metadata, and file writer implementations.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java | Updated expected values in JSON assertions. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergSystemTables.java | Modified test expectations for column_sizes and compression properties. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractTrinoCatalog.java | Revised table properties creation and transaction handling. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java | Introduced new methods for compression level and codec extraction. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java | Added new table properties and validation for compression settings. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java | Updated table property and compression handling during metadata updates. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java | Adjusted writer creation to incorporate storage properties for compression. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroFileWriter.java | Updated Avro writer to accept and use a compression level. |
Comments suppressed due to low confidence (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractTrinoCatalog.java:897
- The method 'createTableProperties' is expected to return a Map<String, String>, but it is returning a Transaction object. Please update the return value to the actual table properties map or adjust the method signature accordingly to ensure type consistency.
return txn;
ffaf8b7 to
4d27522
Compare
| public static final String LOCATION_PROPERTY = "location"; | ||
| public static final String FORMAT_VERSION_PROPERTY = "format_version"; | ||
|
|
||
| public static final String WRITE_COMPRESSION = "write_compression"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Property name should be compression_codec to maintain consistency with equivalent iceberg table property and the connector session property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of precedence should be: session property > table property > catalog config property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the property name. But I don't understand why session property should have a higher precedence over table property, if a user set table property for a particular table, it should have higher precedence compared to the default one from session property right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raunaqmorarka - I would tend to agree with @SongChujun that the precedence order should prefer the table defined codec over the session property (when present) since the table definition seems more specific. I guess you could make an argument that the order should be: session property (if present, but default value is empty and not the default value from catalog config), table property (if present), catalog config default value. Is that what you're suggesting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually session property has precedence when it is set, because that is meant to allow users to override configured behaviour on a per query level. Per query write operation is more granular that per table, so if the session property exists it should get higher precedence.
I haven't been able to find a similar scenario in the code base where something is a session and catalog property while also being a table property. I think trying to be both a session property and table property at the same time is making this code and resulting behaviour quite complex, even though it is a seemingly simple thing.
I think we should consider dropping the existing session property entirely and just having table property and catalog config. I can't think of a great use case for being able to override compression per query for a table. Getting rid of the session property should simplify things here.
| return (IcebergFileFormat) tableProperties.get(FILE_FORMAT_PROPERTY); | ||
| } | ||
|
|
||
| public static Optional<HiveCompressionCodec> getHiveCompressionCodec(Map<String, Object> inputProperties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCompressionCodec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update the name
| } | ||
| return catalog.newCreateTableTransaction(session, schemaTableName, schema, partitionSpec, sortOrder, Optional.ofNullable(tableLocation), createTableProperties(session, tableMetadata, allowedExtraProperties)); | ||
|
|
||
| // If user doesn't set compression-codec for parquet, we need to remove write.parquet.compression-codec property, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user doesn't set anything, our default for iceberg.compression-codec catalog config property should apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already replied in a previous comment.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what is the bug being referring to here ? I responded at #24851 (comment) to the concern raised there.
4d27522 to
4c0b43f
Compare
5d80c4b to
e53d72e
Compare
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
| public static final String LOCATION_PROPERTY = "location"; | ||
| public static final String FORMAT_VERSION_PROPERTY = "format_version"; | ||
|
|
||
| public static final String WRITE_COMPRESSION = "write_compression"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raunaqmorarka - I would tend to agree with @SongChujun that the precedence order should prefer the table defined codec over the session property (when present) since the table definition seems more specific. I guess you could make an argument that the order should be: session property (if present, but default value is empty and not the default value from catalog config), table property (if present), catalog config default value. Is that what you're suggesting here?
753c41b to
ce188eb
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
raunaqmorarka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall
| } | ||
|
|
||
| IcebergFileFormat oldFileFormat = getFileFormat(icebergTable.properties()); | ||
| IcebergFileFormat newFileFormat = getFileFormat(icebergTable.properties()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newFileFormat = oldFileFormat ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it here.
| .map(column -> toTrinoType(column.type(), typeManager)) | ||
| .collect(toImmutableList()); | ||
|
|
||
| HiveCompressionCodec compressionCodec = getHiveCompressionCodec(IcebergFileFormat.AVRO, storageProperties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: static import AVRO here and in the other files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed all IcebergFileFormat.[Parquet, Orc, Avro] to [Parquet, Orc, Avro]
ce188eb to
f0318e7
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
...g/src/main/java/io/trino/plugin/iceberg/catalog/snowflake/IcebergSnowflakeCatalogModule.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
3c73d7c to
873a2b1
Compare
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
7e5f587 to
423b13d
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
Outdated
Show resolved
Hide resolved
423b13d to
f25b1b4
Compare
4e03a75 to
9ff9c3d
Compare
pettyjamesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @SongChujun and @ebyhr!
Makes compression_codec a table property and drops support for compression_codec session property for simplicity
9ff9c3d to
237f0a6
Compare
This PR makes compression_codec a table property and drops support for compression_codec iceberg session property for simplicity.
Users are able to run the following command now to specify the compression_codec to create a new table with compression_codec set to ZSTD.
Users are able able to change the compression_codec via statement
Alter Table Set Properties.Example
The write_compression users specify will replace the session variable
iceberg.compression-codec. If the user change file_format without changing write_compression, it will inherit the write_compression set for the original file format, if the write_compression for the previous file format is set.If the user is trying to set a write_compression that is inconsistent with the file format, the system will throw an exception.
The compatibility matrix for write_compression and file format is the following
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: