Enable reading Iceberg v3 failing on all unimplemented features#27786
Enable reading Iceberg v3 failing on all unimplemented features#27786dain merged 1 commit intotrinodb:masterfrom
Conversation
|
For something as central as table format compatibility, we should have tests with some other system (Spark?) reading tables produced by Trino. |
| } | ||
|
|
||
| // TODO: Remove when Iceberg v3 is fully supported | ||
| private static void validateTableForTrino(BaseTable table, Optional<Long> tableSnapshotId) |
There was a problem hiding this comment.
If the table is upgraded from v2 to v3, all read operations on the table will be blocked, even for snapshots that do not include any v3 data
There was a problem hiding this comment.
There is a test showing that insert and read works. Are you seeing something I'm not?
There was a problem hiding this comment.
I see that you are explicitly checking for v3 features (for example, default value) and rejecting queries against v3 tables. There is a possible scenario where another engine upgrades a table from v2 to v3, performs inserts and/or updates (including deletes), and then enables v3-specific features such as default value
In this case, what is the expected behavior when Trino queries such a table? how should this behave when time travel is used to query a snapshot from before upgrade?
There was a problem hiding this comment.
Yes. That is the point of this PR. Today, there are no v3 tables allowed at all, so any query against them will fail. With this PR you can create, insert and read v3 tables, and most other things fail. If you have defaults the table can not be used, but remember it can't be used today.
This is an iterative PR. Allow what works to be used and explicitly fail on everything else. Then additional PRs will implement the remaining features one at a time.
cae6b76 to
2b0318c
Compare
There was a problem hiding this comment.
Pull request overview
This PR enables support for creating Iceberg format version 3 tables, upgrading v2 tables to v3, and inserting data into v3 tables. The implementation intentionally limits v3 support to these basic operations while explicitly rejecting unsupported v3 features to prevent spec violations.
Key changes:
- Updated maximum supported format version from 2 to 3 while keeping default at 2
- Added validation logic to reject unsupported v3 features (deletion vectors, column defaults, encryption)
- Implemented version checks for row-level operations (DELETE, UPDATE, MERGE) and table procedures (OPTIMIZE, ADD_FILES)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java | Comprehensive test suite covering v3 table creation, upgrades, inserts, lineage metadata validation, and rejection of unsupported features |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java | Updated error message to reflect new maximum supported format version (3 instead of 2) |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java | Added validation to reject PUFFIN deletion vectors during split creation |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java | Added validateTableForTrino method to reject unsupported v3 features, refactored version checks for table procedures, enhanced verifyTableVersionForUpdate to block v3 table mutations |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java | Increased FORMAT_VERSION_SUPPORT_MAX to 3 while maintaining default format version at 2 for backward compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
90cc94c to
aba25c7
Compare
| .map(table::snapshot) | ||
| .orElse(table.currentSnapshot()); | ||
| if (snapshot == null) { | ||
| // the snapshot does not exist, this is an error that will be handled elsewhere |
There was a problem hiding this comment.
That's not an error
spark-sql (default)> create table t1 (data integer) using iceberg tblproperties ('format-version' = 3);
Spark does not create a snapshot when doing CREATE TABLE
|
Related failues |
ebyhr
left a comment
There was a problem hiding this comment.
There is a bug about column defaults. The INSERT statement in the below test doesn't throw an exception.
@Test
void testWriteDefault()
{
String tableName = "tmp_v3_defaults_src_" + randomNameSuffix();
assertUpdate("CREATE TABLE " + tableName + " (id INTEGER, data INTEGER) WITH (format_version = 3, format = 'ORC')");
assertUpdate("INSERT INTO " + tableName + " VALUES (1, 10)", 1);
Table icebergTable = loadTable(tableName);
icebergTable.updateSchema()
.updateColumnDefault("data", Expressions.lit(42))
.commit();
assertQueryFails(
"INSERT INTO " + tableName + " (id) VALUES (2)",
".*Iceberg v3 column default values are not supported.*");
assertUpdate("DROP TABLE " + tableName);
}
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Show resolved
Hide resolved
| Table icebergTable = new HadoopTables(new Configuration(false)).create( | ||
| schema, | ||
| PartitionSpec.unpartitioned(), | ||
| SortOrder.unsorted(), | ||
| ImmutableMap.of( | ||
| "format-version", "3", | ||
| "write.format.default", "ORC"), | ||
| hadoopTableLocation.toString()); |
There was a problem hiding this comment.
We don't need to use HadoopTables in most tests. We can use loadTable method for existing tables, or TrinoCatalog (there's IcebergTestUtils#getTrinoCatalog) for new tables.
There was a problem hiding this comment.
testV3RejectsColumnDefaults and testV3RejectsColumnWriteDefaults require HadoopTables because they need to create tables with v3 column default features that cannot be created through Trino SQL. These tests verify Trino correctly rejects v3 features when reading tables created by other engines.
There was a problem hiding this comment.
That isn't true. You can create such tables like this:
catalog.newCreateTableTransaction(
SESSION,
schemaTableName,
new Schema(
Types.NestedField.optional("id")
.withId(1)
.ofType(Types.IntegerType.get())
.withInitialDefault(Expressions.lit(42))
.build()),
PartitionSpec.unpartitioned(),
SortOrder.unsorted(),
Optional.ofNullable(catalog.defaultTableLocation(SESSION, schemaTableName)),
ImmutableMap.of("format-version", "3"))
.commitTransaction();
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java
Show resolved
Hide resolved
2473ab1 to
b08b997
Compare
| } | ||
|
|
||
| Schema schema = metadata.schemasById().get(snapshot.schemaId()); | ||
| if (schema == null) { |
There was a problem hiding this comment.
Under what circumstances did you encounter this? I would not expect it to be null
There was a problem hiding this comment.
The snapshot can be null for an empty table that has been created but has no data inserted yet. As findinpath noted above, Spark (and other engines) don't create a snapshot when doing CREATE TABLE. The null check prevents NPE when validating an empty table.
There was a problem hiding this comment.
@findinpath @dain If the table was created by Spark and is empty, then an empty snapshot is expected. However, reaching this line indicates that a snapshot exists but has no schema, which should be an invalid state. I do not see a valid scenario where this can happen. I would suggest either removing this code path or explicitly failing fast by throwing an exception. thoughts?
cc @ebyhr
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
Show resolved
Hide resolved
a5cf8a5 to
ab96194
Compare
All usage of any unimplemented v3 feature results in a failure.
Cherry pick of trinodb/trino#27786 Co-authored-by: Dain Sundstrom <dain@iq80.com>
Cherry pick of trinodb/trino#27786 Co-authored-by: Dain Sundstrom <dain@iq80.com> ## Description Add initial support for Iceberg table format version 3 while constraining unsupported features and row-level operations to safe, explicitly validated paths. **New Features:** 1. Allow creating Iceberg tables with format version 3 and inserting/querying data from them, including partitioned tables. 2. Support upgrading existing Iceberg format version 2 tables to format version 3. **Enhancements:** 1. Introduce version guardrails for Iceberg operations, including explicit maximum supported table format and maximum format version for row-level operations. 2. Validate Iceberg v3 tables for unsupported features such as column default values and table encryption before executing writes or inserts. 3. Add validation to reject use of PUFFIN-based deletion vectors that are not yet supported. 4. Improve error handling for Iceberg update and delete operations by using specific PrestoException errors and clearer messages when format/version constraints are violated. 5. Prevent OPTIMIZE (rewrite_data_files) from running on Iceberg tables with format versions above the supported threshold. ## Test Plan Add TestIcebergV3 integration test suite covering creation, upgrade, insert, query, and partitioning for v3 tables, as well as rejection of unsupported delete, update, merge, and OPTIMIZE operations on v3 tables. ## Release Notes ``` == RELEASE NOTES == Iceberg Connector Changes * Add support for creating Iceberg tables with format-version = '3'. * Add reading from Iceberg V3 tables, including partitioned tables. * Add INSERT operations into Iceberg V3 tables. * Add support for upgrading existing V2 tables to V3 using the Iceberg API. ``` ## Summary by Sourcery Add guarded initial support for Iceberg table format version 3 while constraining unsupported features and row-level operations to fail fast with clear errors. New Features: - Enable creating, reading from, and inserting into Iceberg tables with format version 3, including partitioned tables. Enhancements: - Introduce format-version guardrails for Iceberg operations, including a maximum supported table format and a maximum format version for row-level operations. - Validate Iceberg v3 tables for unsupported features such as column default values, table encryption, and PUFFIN-based deletion vectors before executing reads or writes. - Tighten validation and error handling for Iceberg update, delete, merge, and OPTIMIZE (rewrite_data_files) operations on tables with unsupported format versions. Tests: - Add TestIcebergV3 integration suite covering supported v3 operations and expected failures for unsupported delete, update, merge, OPTIMIZE, encryption, and deletion vector features. Co-authored-by: Dain Sundstrom <dain@iq80.com>
Description
Add support for creating Iceberg format version 3 tables, upgrading v2 tables to v3, and inserting into v3 tables.
This change intentionally does not implement Iceberg v3 features beyond allowing v3 metadata and validating that inserts produce the required row-lineage metadata (as observed through the Iceberg library).
To avoid spec violations while v3 support is incomplete, the connector now explicitly rejects v3 features that are not yet supported. The goal is to safely unlock v3 table creation and incremental adoption, while making unsupported behavior fail fast and predictably.
Unsupported v3 features that now throw
NOT_SUPPORTEDinclude:DELETE,UPDATE,MERGEOPTIMIZEon v3 tablesadd_files/add_files_from_tableprocedures on v3 tablesinitial-default,write-default)encryption-keys/ snapshotkey-id)Tests:
Add
TestIcebergV3to cover:nextRowId,firstRowId,dataSequenceNumber)Release notes
(X) Release notes are required, with the following suggested text: