Skip to content

Enable reading Iceberg v3 failing on all unimplemented features#27786

Merged
dain merged 1 commit intotrinodb:masterfrom
dain:iceberg-v3
Jan 10, 2026
Merged

Enable reading Iceberg v3 failing on all unimplemented features#27786
dain merged 1 commit intotrinodb:masterfrom
dain:iceberg-v3

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Dec 30, 2025

Description

Add support for creating Iceberg format version 3 tables, upgrading v2 tables to v3, and inserting into v3 tables.

This change intentionally does not implement Iceberg v3 features beyond allowing v3 metadata and validating that inserts produce the required row-lineage metadata (as observed through the Iceberg library).

To avoid spec violations while v3 support is incomplete, the connector now explicitly rejects v3 features that are not yet supported. The goal is to safely unlock v3 table creation and incremental adoption, while making unsupported behavior fail fast and predictably.

Unsupported v3 features that now throw NOT_SUPPORTED include:

  • Row-level mutations on v3 tables: DELETE, UPDATE, MERGE
  • OPTIMIZE on v3 tables
  • add_files / add_files_from_table procedures on v3 tables
  • Deletion vectors (PUFFIN delete files)
  • Column default values (initial-default, write-default)
  • Iceberg table encryption (encryption-keys / snapshot key-id)

Tests:

  • Add TestIcebergV3 to cover:

    • create v3 tables and upgrade v2→v3
    • inserts into v3 tables produce required lineage metadata (nextRowId, firstRowId, dataSequenceNumber)
    • unsupported v3 features fail with clear exceptions

Release notes

(X) Release notes are required, with the following suggested text:

## Iceberg
* Allow creating Iceberg format version 3 tables, upgrading v2 tables to v3, and inserting into v3 tables. Unsupported v3 features are explicitly rejected.

@cla-bot cla-bot bot added the cla-signed label Dec 30, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Dec 30, 2025
@dain dain requested review from ebyhr, electrum, findepi and raunaqmorarka and removed request for findepi December 30, 2025 08:09
@dain dain mentioned this pull request Dec 30, 2025
@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 30, 2025

For something as central as table format compatibility, we should have tests with some other system (Spark?) reading tables produced by Trino.

}

// TODO: Remove when Iceberg v3 is fully supported
private static void validateTableForTrino(BaseTable table, Optional<Long> tableSnapshotId)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table is upgraded from v2 to v3, all read operations on the table will be blocked, even for snapshots that do not include any v3 data

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test showing that insert and read works. Are you seeing something I'm not?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are explicitly checking for v3 features (for example, default value) and rejecting queries against v3 tables. There is a possible scenario where another engine upgrades a table from v2 to v3, performs inserts and/or updates (including deletes), and then enables v3-specific features such as default value

In this case, what is the expected behavior when Trino queries such a table? how should this behave when time travel is used to query a snapshot from before upgrade?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is the point of this PR. Today, there are no v3 tables allowed at all, so any query against them will fail. With this PR you can create, insert and read v3 tables, and most other things fail. If you have defaults the table can not be used, but remember it can't be used today.

This is an iterative PR. Allow what works to be used and explicitly fail on everything else. Then additional PRs will implement the remaining features one at a time.

@dain dain force-pushed the iceberg-v3 branch 2 times, most recently from cae6b76 to 2b0318c Compare January 2, 2026 22:27
@electrum electrum requested a review from Copilot January 7, 2026 00:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables support for creating Iceberg format version 3 tables, upgrading v2 tables to v3, and inserting data into v3 tables. The implementation intentionally limits v3 support to these basic operations while explicitly rejecting unsupported v3 features to prevent spec violations.

Key changes:

  • Updated maximum supported format version from 2 to 3 while keeping default at 2
  • Added validation logic to reject unsupported v3 features (deletion vectors, column defaults, encryption)
  • Implemented version checks for row-level operations (DELETE, UPDATE, MERGE) and table procedures (OPTIMIZE, ADD_FILES)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java Comprehensive test suite covering v3 table creation, upgrades, inserts, lineage metadata validation, and rejection of unsupported features
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java Updated error message to reflect new maximum supported format version (3 instead of 2)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java Added validation to reject PUFFIN deletion vectors during split creation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Added validateTableForTrino method to reject unsupported v3 features, refactored version checks for table procedures, enhanced verifyTableVersionForUpdate to block v3 table mutations
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java Increased FORMAT_VERSION_SUPPORT_MAX to 3 while maintaining default format version at 2 for backward compatibility

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dain dain force-pushed the iceberg-v3 branch 3 times, most recently from 90cc94c to aba25c7 Compare January 7, 2026 02:43
.map(table::snapshot)
.orElse(table.currentSnapshot());
if (snapshot == null) {
// the snapshot does not exist, this is an error that will be handled elsewhere
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not an error

spark-sql (default)> create table t1 (data integer) using iceberg tblproperties ('format-version' = 3);

Spark does not create a snapshot when doing CREATE TABLE

@findinpath
Copy link
Copy Markdown
Contributor

Related failues

Error:    TestIcebergV3.testV3RejectsColumnDefaults:283->AbstractTestQueryFramework.assertUpdate:411->AbstractTestQueryFramework.assertUpdate:416 » QueryFailed Iceberg v3 column default values are not supported
Error:    TestIcebergV3.testV3RejectsColumnWriteDefaults:330->AbstractTestQueryFramework.assertUpdate:411->AbstractTestQueryFramework.assertUpdate:416 » QueryFailed Iceberg v3 column default values are not supported
Error:    TestIcebergV3.testV3RejectsDeletionVectorsPuffinDeleteFile:480 » FileSystem /tmp/TrinoTest13055050090781735228/iceberg_data/hadoop_v3_dv_qdovndcpa6: failed to delete one or more files; see suppressed exceptions for details
Error:    TestIcebergV3.testV3RejectsEncryptionKeys:416->AbstractTestQueryFramework.assertUpdate:411->AbstractTestQueryFramework.assertUpdate:416 » QueryFailed Iceberg table encryption is not supported

Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug about column defaults. The INSERT statement in the below test doesn't throw an exception.

    @Test
    void testWriteDefault()
    {
        String tableName = "tmp_v3_defaults_src_" + randomNameSuffix();
        assertUpdate("CREATE TABLE " + tableName + " (id INTEGER, data INTEGER) WITH (format_version = 3, format = 'ORC')");
        assertUpdate("INSERT INTO " + tableName + " VALUES (1, 10)", 1);

        Table icebergTable = loadTable(tableName);
        icebergTable.updateSchema()
                .updateColumnDefault("data", Expressions.lit(42))
                .commit();

        assertQueryFails(
                "INSERT INTO " + tableName + " (id) VALUES (2)",
                ".*Iceberg v3 column default values are not supported.*");

        assertUpdate("DROP TABLE " + tableName);
    }

Comment on lines +437 to +444
Table icebergTable = new HadoopTables(new Configuration(false)).create(
schema,
PartitionSpec.unpartitioned(),
SortOrder.unsorted(),
ImmutableMap.of(
"format-version", "3",
"write.format.default", "ORC"),
hadoopTableLocation.toString());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use HadoopTables in most tests. We can use loadTable method for existing tables, or TrinoCatalog (there's IcebergTestUtils#getTrinoCatalog) for new tables.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testV3RejectsColumnDefaults and testV3RejectsColumnWriteDefaults require HadoopTables because they need to create tables with v3 column default features that cannot be created through Trino SQL. These tests verify Trino correctly rejects v3 features when reading tables created by other engines.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't true. You can create such tables like this:

        catalog.newCreateTableTransaction(
                        SESSION,
                        schemaTableName,
                        new Schema(
                                Types.NestedField.optional("id")
                                        .withId(1)
                                        .ofType(Types.IntegerType.get())
                                        .withInitialDefault(Expressions.lit(42))
                                        .build()),
                        PartitionSpec.unpartitioned(),
                        SortOrder.unsorted(),
                        Optional.ofNullable(catalog.defaultTableLocation(SESSION, schemaTableName)),
                        ImmutableMap.of("format-version", "3"))
                .commitTransaction();

@dain dain force-pushed the iceberg-v3 branch 2 times, most recently from 2473ab1 to b08b997 Compare January 8, 2026 22:21
@github-actions github-actions bot added the docs label Jan 8, 2026
}

Schema schema = metadata.schemasById().get(snapshot.schemaId());
if (schema == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances did you encounter this? I would not expect it to be null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot can be null for an empty table that has been created but has no data inserted yet. As findinpath noted above, Spark (and other engines) don't create a snapshot when doing CREATE TABLE. The null check prevents NPE when validating an empty table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath @dain If the table was created by Spark and is empty, then an empty snapshot is expected. However, reaching this line indicates that a snapshot exists but has no schema, which should be an invalid state. I do not see a valid scenario where this can happen. I would suggest either removing this code path or explicitly failing fast by throwing an exception. thoughts?
cc @ebyhr

@dain dain force-pushed the iceberg-v3 branch 2 times, most recently from a5cf8a5 to ab96194 Compare January 10, 2026 07:03
All usage of any unimplemented v3 feature results in a failure.
@dain dain merged commit bb65065 into trinodb:master Jan 10, 2026
55 checks passed
@dain dain deleted the iceberg-v3 branch January 10, 2026 22:30
@github-actions github-actions bot added this to the 480 milestone Jan 10, 2026
Joe-Abraham added a commit to Joe-Abraham/presto that referenced this pull request Mar 2, 2026
Cherry pick of trinodb/trino#27786
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Joe-Abraham added a commit to prestodb/presto that referenced this pull request Mar 2, 2026
Cherry pick of trinodb/trino#27786
Co-authored-by: Dain Sundstrom <dain@iq80.com>

## Description
Add initial support for Iceberg table format version 3 while
constraining unsupported features and row-level operations to safe,
explicitly validated paths.

**New Features:**

1. Allow creating Iceberg tables with format version 3 and
inserting/querying data from them, including partitioned tables.
2. Support upgrading existing Iceberg format version 2 tables to format
version 3.

**Enhancements:**

1. Introduce version guardrails for Iceberg operations, including
explicit maximum supported table format and maximum format version for
row-level operations.
2. Validate Iceberg v3 tables for unsupported features such as column
default values and table encryption before executing writes or inserts.
3. Add validation to reject use of PUFFIN-based deletion vectors that
are not yet supported.
4. Improve error handling for Iceberg update and delete operations by
using specific PrestoException errors and clearer messages when
format/version constraints are violated.
5. Prevent OPTIMIZE (rewrite_data_files) from running on Iceberg tables
with format versions above the supported threshold.

## Test Plan
Add TestIcebergV3 integration test suite covering creation, upgrade,
insert, query, and partitioning for v3 tables, as well as rejection of
unsupported delete, update, merge, and OPTIMIZE operations on v3 tables.

## Release Notes
```
== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for creating Iceberg tables with format-version = '3'.
* Add reading from Iceberg V3 tables, including partitioned tables.
* Add INSERT operations into Iceberg V3 tables.
* Add support for upgrading existing V2 tables to V3 using the Iceberg API.
```

## Summary by Sourcery

Add guarded initial support for Iceberg table format version 3 while
constraining unsupported features and row-level operations to fail fast
with clear errors.

New Features:
- Enable creating, reading from, and inserting into Iceberg tables with
format version 3, including partitioned tables.

Enhancements:
- Introduce format-version guardrails for Iceberg operations, including
a maximum supported table format and a maximum format version for
row-level operations.
- Validate Iceberg v3 tables for unsupported features such as column
default values, table encryption, and PUFFIN-based deletion vectors
before executing reads or writes.
- Tighten validation and error handling for Iceberg update, delete,
merge, and OPTIMIZE (rewrite_data_files) operations on tables with
unsupported format versions.

Tests:
- Add TestIcebergV3 integration suite covering supported v3 operations
and expected failures for unsupported delete, update, merge, OPTIMIZE,
encryption, and deletion vector features.

Co-authored-by: Dain Sundstrom <dain@iq80.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

7 participants