Skip to content

feat(plugin-iceberg): Add support for mutating an Iceberg branch#27147

Merged
agrawalreetika merged 2 commits intoprestodb:masterfrom
agrawalreetika:iceberg-branch-mutation
Mar 5, 2026
Merged

feat(plugin-iceberg): Add support for mutating an Iceberg branch#27147
agrawalreetika merged 2 commits intoprestodb:masterfrom
agrawalreetika:iceberg-branch-mutation

Conversation

@agrawalreetika
Copy link
Copy Markdown
Member

@agrawalreetika agrawalreetika commented Feb 15, 2026

Description

Add support for mutating an Iceberg branch

Motivation and Context

Resolves #22030

Impact

Resolves #22030

Add support for mutating an Iceberg branch based on the syntax disucssed here

INSERT INTO "orders.branch_audit_branch" VALUES (1, 'Product A', 100.00);

UPDATE "orders.branch_audit_branch" SET price = 120.00 WHERE id = 1;

DELETE FROM "orders.branch_audit_branch" WHERE id = 2;

Test Plan

Added

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Summary by Sourcery

Add branch-aware mutation support to the Iceberg connector, enabling INSERT/UPDATE/DELETE operations on specific Iceberg branches via extended table naming.

New Features:

  • Support addressing Iceberg branches in table names using a branch-qualified naming pattern for mutations.
  • Allow INSERT operations to append data to a specific Iceberg branch instead of the main table.
  • Allow UPDATE and DELETE operations to modify data on a specific Iceberg branch with appropriate branch routing and validation.

Enhancements:

  • Improve error messaging and validation when committing Iceberg updates by including branch context and verifying that referenced branches exist.

Tests:

  • Add TestIcebergBranchMutations covering inserts, updates, deletes, multi-step mutations, complex predicates, branch isolation, and INSERT ... SELECT into Iceberg branches.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for mutating an Iceberg branch

@agrawalreetika agrawalreetika self-assigned this Feb 15, 2026
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 15, 2026
@prestodb-ci prestodb-ci requested review from a team, ScrapCodes and anandamideShakyan and removed request for a team February 15, 2026 08:25
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 15, 2026

Reviewer's Guide

Adds Iceberg branch-aware mutation support (INSERT/UPDATE/DELETE) by extending table name parsing to include branch identifiers, routing write/row-delta operations to specific branches, adjusting conflict validation for branch operations, and introducing comprehensive tests for branch mutations.

Sequence diagram for branch-aware Iceberg INSERT/UPDATE/DELETE mutations

sequenceDiagram
    actor User
    participant PrestoPlanner
    participant IcebergAbstractMetadata
    participant IcebergTable
    participant Transaction
    participant AppendFiles
    participant RowDelta

    User->>PrestoPlanner: SQL INSERT/UPDATE/DELETE on orders.branch_audit_branch
    PrestoPlanner->>IcebergAbstractMetadata: getTableHandle(schemaTableName)
    IcebergAbstractMetadata->>IcebergTableName: from(name)
    IcebergTableName-->>IcebergAbstractMetadata: tableName, snapshotId, branchName

    PrestoPlanner->>IcebergAbstractMetadata: beginIcebergTableInsert / beginUpdate / beginDelete
    IcebergAbstractMetadata->>IcebergTable: getIcebergTable(schemaTableName)

    alt branchName present
        IcebergAbstractMetadata->>IcebergTable: refs().get(branch)
        IcebergTable-->>IcebergAbstractMetadata: SnapshotRef
        IcebergAbstractMetadata-->>IcebergAbstractMetadata: validate branchRef.isBranch()
        IcebergAbstractMetadata->>IcebergTable: newTransaction()
        IcebergTable-->>IcebergAbstractMetadata: Transaction

        opt INSERT path
            IcebergAbstractMetadata->>Transaction: newAppend()
            Transaction-->>IcebergAbstractMetadata: AppendFiles
            IcebergAbstractMetadata->>AppendFiles: toBranch(branchName)
        end

        opt UPDATE/DELETE path
            IcebergAbstractMetadata->>Transaction: newRowDelta()
            Transaction-->>IcebergAbstractMetadata: RowDelta
            IcebergAbstractMetadata->>RowDelta: toBranch(branchName)
            IcebergAbstractMetadata-->>IcebergAbstractMetadata: skip conflict validation
        end

        IcebergAbstractMetadata->>Transaction: commitTransaction()
    else no branchName
        IcebergAbstractMetadata->>IcebergTable: newTransaction()
        IcebergTable-->>IcebergAbstractMetadata: Transaction

        opt INSERT path
            IcebergAbstractMetadata->>Transaction: newAppend()
            Transaction-->>IcebergAbstractMetadata: AppendFiles
        end

        opt UPDATE/DELETE path
            IcebergAbstractMetadata->>Transaction: newRowDelta()
            Transaction-->>IcebergAbstractMetadata: RowDelta
            IcebergAbstractMetadata-->>IcebergAbstractMetadata: perform full conflict validation
        end

        IcebergAbstractMetadata->>Transaction: commitTransaction()
    end

    Transaction-->>PrestoPlanner: commit result / error
    PrestoPlanner-->>User: query success or PrestoException with branch in message
Loading

Updated class diagram for IcebergTableName and branch-aware metadata operations

classDiagram
    class IcebergTableName {
        -String tableName
        -IcebergTableType icebergTableType
        -Optional~Long~ snapshotId
        -Optional~String~ branchName
        -Optional~Long~ changelogEndSnapshot
        +IcebergTableName(tableName:String, icebergTableType:IcebergTableType, snapshotId:Optional~Long~, branchName:Optional~String~, changelogEndSnapshot:Optional~Long~)
        +String getTableName()
        +IcebergTableType getTableType()
        +Optional~Long~ getSnapshotId()
        +Optional~String~ getBranchName()
        +Optional~Long~ getChangelogEndSnapshot()
        +static IcebergTableName from(name:String)
    }

    class IcebergTableHandle {
        -String schemaName
        -IcebergTableName icebergTableName
        +IcebergTableName getIcebergTableName()
        +String getSchemaName()
    }

    class IcebergAbstractMetadata {
        -Transaction transaction
        +ConnectorInsertTableHandle beginIcebergTableInsert(session:ConnectorSession, table:IcebergTableHandle, icebergTable:Table)
        +Optional~ConnectorOutputMetadata~ finishInsert(session:ConnectorSession, insertHandle:IcebergInsertTableHandle, fragments:Collection~Slice~)
        +Optional~ConnectorOutputMetadata~ finishWrite(session:ConnectorSession, writableTableHandle:IcebergWritableTableHandle, fragments:Collection~Slice~, operationType:OperationType)
        +ConnectorDeleteTableHandle beginDelete(session:ConnectorSession, tableHandle:ConnectorTableHandle)
        +Optional~ConnectorOutputMetadata~ finishDeleteWithOutput(session:ConnectorSession, handle:IcebergDeleteTableHandle, fragments:Collection~Slice~)
        +ConnectorTableHandle beginUpdate(session:ConnectorSession, tableHandle:ConnectorTableHandle, updatedColumns:List~ColumnHandle~)
    }

    class IcebergOutputTableHandle {
        -String schemaName
        -IcebergTableName tableName
        +IcebergTableName getTableName()
    }

    class IcebergEqualityDeleteAsJoin {
        +TableScanNode createDeletesTableScan(mapping:ImmutableMap~VariableReferenceExpression,ColumnHandle~, icebergTableHandle:IcebergTableHandle)
        +TableScanNode createNewRoot(node:TableScanNode, icebergTableHandle:IcebergTableHandle)
    }

    IcebergTableHandle --> IcebergTableName : uses
    IcebergOutputTableHandle --> IcebergTableName : uses
    IcebergAbstractMetadata --> IcebergTableHandle : reads branchName via getIcebergTableName()
    IcebergEqualityDeleteAsJoin --> IcebergTableHandle : reads branch-aware IcebergTableName
    IcebergEqualityDeleteAsJoin --> IcebergTableName : constructs new instances with branchName
    IcebergAbstractMetadata ..> IcebergTableName : parses branch-qualified table names via from()
Loading

File-Level Changes

Change Details Files
Add branch-awareness to Iceberg table naming and propagation through table handles and optimizer plans.
  • Extend IcebergTableName grammar to parse optional .branch_<name> suffix and store branchName as an Optional field
  • Update IcebergTableName constructor, JSON properties, and factory method to include branchName
  • Propagate branchName when constructing IcebergTableName instances in metadata, equality-delete optimizer paths, and tests
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergTableName.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergNativeMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergEqualityDeleteAsJoin.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/hive/TestRenameTableOnFragileFileSystem.java
Route INSERT/UPDATE/DELETE operations to specific Iceberg branches and adjust validation semantics for branch operations.
  • On beginInsert/beginDelete/beginUpdate, validate that a referenced branch exists and is a proper branch via Iceberg refs; otherwise raise NOT_FOUND
  • For INSERT finishInsert, direct AppendFiles to a branch using transaction.newAppend().toBranch(branchName) when a branch is specified and improve commit error message to include branch context
  • For UPDATE/DELETE finishWrite and finishDeleteWithOutput, send RowDelta operations to the specified branch with rowDelta.toBranch(branchName)
  • Skip serialize-level conflicting data-file/delete-file validations when operating on a branch to allow sequential branch mutations while preserving existing validations for mainline tables
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Introduce tests covering Iceberg branch mutation behavior and isolation semantics.
  • Add TestIcebergBranchMutations integration test class using IcebergQueryRunner with Hive catalog
  • Cover INSERT/UPDATE/DELETE against branch-qualified tables, multiple sequential mutations on a branch, column-list inserts, complex WHERE clauses, branch isolation between two branches, and INSERT INTO branch FROM SELECT
  • Use helper methods to create/drop test tables and assert branch contents vs main table contents via FOR SYSTEM_VERSION AS OF
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java

Assessment against linked issues

Issue Objective Addressed Explanation
#22030 Enable specifying an Iceberg branch as the target for mutations (INSERT, UPDATE, DELETE) in Presto's Iceberg connector.
#22030 Wire branch selection through the Iceberg metadata/handle layer (including validation that the branch exists) and add tests ensuring correct behavior and isolation of branch mutations.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In finishWrite, the previously present validateFromSnapshot call on RowDelta was removed entirely; if this validation is still required for non-branch writes, consider reintroducing it (e.g., outside or in parallel to the new branch handling) to avoid changing snapshot semantics unintentionally.
  • Branch existence validation logic (refs().get(branch) + type check) is duplicated in multiple places (beginIcebergTableInsert, beginDelete, beginUpdate); consider extracting a small helper to centralize this check and keep future behavior changes consistent.
  • In finishWrite, all conflict validation (validateNoConflictingDataFiles, validateDeletedFiles, validateNoConflictingDeleteFiles) is skipped for any branch write; if only certain branch scenarios need relaxed constraints (e.g., specific sequential mutations), consider narrowing the condition so branch operations still get as much correctness validation as possible.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `finishWrite`, the previously present `validateFromSnapshot` call on `RowDelta` was removed entirely; if this validation is still required for non-branch writes, consider reintroducing it (e.g., outside or in parallel to the new branch handling) to avoid changing snapshot semantics unintentionally.
- Branch existence validation logic (`refs().get(branch)` + type check) is duplicated in multiple places (`beginIcebergTableInsert`, `beginDelete`, `beginUpdate`); consider extracting a small helper to centralize this check and keep future behavior changes consistent.
- In `finishWrite`, all conflict validation (`validateNoConflictingDataFiles`, `validateDeletedFiles`, `validateNoConflictingDeleteFiles`) is skipped for any branch write; if only certain branch scenarios need relaxed constraints (e.g., specific sequential mutations), consider narrowing the condition so branch operations still get as much correctness validation as possible.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergTableName.java:45-46` </location>
<code_context>
 {
     private static final Pattern TABLE_PATTERN = Pattern.compile("" +
-            "(?<table>[^$@]+)" +
+            "(?<table>[^$@.]+)" +
+            "(?:\\.branch_(?<branch>[^$@]+))?" +
             "(?:@(?<ver1>[0-9]+))?" +
             "(?:\\$(?<type>[^@]+)(?:@(?<ver2>[0-9]+))?)?");
</code_context>

<issue_to_address>
**issue (bug_risk):** New table name pattern forbids dots and reserves `.branch_` in the table segment, which changes parsing semantics.

The updated pattern now rejects dots in `table` and treats `.branch_...` as a special suffix, so existing table names containing `.` or `.branch_` will be parsed differently or not match at all. If those names exist in your deployment, this is a breaking change. To reserve `.branch_` more safely, consider keeping the original `table` pattern and adding explicit validation/error handling when encountering ambiguous names.
</issue_to_address>

### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java:75` </location>
<code_context>
+    }
+
+    @Test
+    public void testInsertIntoBranch()
+    {
+        String tableName = "test_insert_branch";
</code_context>

<issue_to_address>
**suggestion (testing):** Add negative tests for mutations on non-existent or non-branch refs

The new logic validates that the qualified branch name (e.g., `foo.branch_audit_branch`) exists and throws a `NOT_FOUND` `PrestoException` otherwise. Please add negative tests (using `assertQueryFails`) that cover these error paths, e.g.:

- `INSERT INTO "test_insert_branch.branch_non_existing" ...` fails with `NOT_FOUND` and mentions the branch and table.
- `UPDATE "test_insert_branch.branch_<tag_name>" ...` where `<tag_name>` is a tag (non-branch ref) also fails with `NOT_FOUND`.

This will ensure the error behavior and messages for invalid refs are locked in for insert/update/delete paths.

Suggested implementation:

```java
    @Test
    public void testInsertIntoBranch()
    {
        String tableName = "test_insert_branch";
        createTable(tableName);
        try {
            assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'audit_branch'");
            assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_branch' AND type = 'BRANCH'", "VALUES 1");
            assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 2");
            // Insert into branch
            assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_audit_branch\" VALUES (3, 'Charlie', 300), (4, 'David', 400)", 2);
            assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 4");
            // Verify main table still has only original data
            assertQuery(session, "SELECT count(*) FROM " + tableName, "VALUES 2");

            // Negative tests: non-existent branch ref
            assertQueryFails(
                    session,
                    "INSERT INTO \"" + tableName + ".branch_non_existing\" VALUES (5, 'Eve', 500)",
                    ".*branch_non_existing.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "UPDATE \"" + tableName + ".branch_non_existing\" SET value = 999 WHERE id = 1",
                    ".*branch_non_existing.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "DELETE FROM \"" + tableName + ".branch_non_existing\" WHERE id = 1",
                    ".*branch_non_existing.*" + tableName + ".*");

            // Create a tag that points to the branch; mutations against the tag ref should also fail
            assertUpdate(session, "ALTER TABLE " + tableName + " CREATE TAG 'audit_tag' AS 'audit_branch'");
            assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_tag' AND type = 'TAG'", "VALUES 1");
            assertQueryFails(
                    session,
                    "INSERT INTO \"" + tableName + ".branch_audit_tag\" VALUES (6, 'Frank', 600)",
                    ".*branch_audit_tag.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "UPDATE \"" + tableName + ".branch_audit_tag\" SET value = 1000 WHERE id = 1",
                    ".*branch_audit_tag.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "DELETE FROM \"" + tableName + ".branch_audit_tag\" WHERE id = 1",
                    ".*branch_audit_tag.*" + tableName + ".*");

```

1. These changes assume `assertQueryFails(Session, String, String)` is already available in this test base class (as in other Presto tests). If the overload is different (e.g., no `session` argument), adjust the calls accordingly.
2. The regex expectations are intentionally loose and only assert that the failing error message mentions both the ref name and the table (as requested). If your implementation also includes `NOT_FOUND` or a more specific prefix, you may want to tighten the patterns (e.g., `"(?s).*NOT_FOUND.*branch_non_existing.*" + tableName + ".*"`).
</issue_to_address>

### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java:136` </location>
<code_context>
+    }
+
+    @Test
+    public void testMultipleMutationsOnBranch()
+    {
+        String tableName = "test_multiple_mutations_branch";
</code_context>

<issue_to_address>
**suggestion (testing):** Cover the branch-specific conflict validation behaviour (skipped validation on branches)

This test does multiple mutations on a branch, but it doesn’t clearly cover a case that would previously have failed conflict validation (e.g., under SERIALIZABLE delete isolation).

Please either extend this test or add a new one that:
- Configures `DELETE_ISOLATION_LEVEL = 'serializable'` (table or session), and
- Executes a sequence of mutations on the same rows on a branch that would previously have triggered conflict validation, asserting they now succeed.

That will directly verify the behaviour change behind skipping `validateNoConflictingDataFiles` and delete conflict checks on branches.

Suggested implementation:

```java
    @Test
    public void testMultipleMutationsOnBranch()
    {
        String tableName = "test_multiple_mutations_branch";
        createTable(tableName);
        try {
            // Configure delete isolation to serializable so that this sequence of mutations
            // would be subject to conflict validation on the main branch
            assertUpdate(session, "ALTER TABLE " + tableName + " SET PROPERTIES delete_isolation_level = 'serializable'");

            assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'multi_branch'");

            // Perform multiple conflicting operations on the same logical rows on the branch.
            // This used to trigger conflict validation (e.g., validateNoConflictingDataFiles and
            // serializable delete checks) but should now succeed on branches.
            assertUpdate(session, "UPDATE \"" + tableName + ".branch_multi_branch\" SET value = 150 WHERE id = 1", 1);
            assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 1", 1);
            assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (1, 'Alice', 200)", 1);
            assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 2", 1);
            assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (3, 'Charlie', 300)", 1);

            // Verify final state in branch: row 1 has been rewritten, row 2 deleted, row 3 inserted
            assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch'", "VALUES 2");
            assertQuery(
                    session,
                    "SELECT id, name, value FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch' ORDER BY id",
                    "VALUES (1, 'Alice', 200), (3, 'Charlie', 300)");

```

If the Iceberg connector in this codebase uses a different mechanism for configuring delete isolation (for example, a session property like `SET SESSION iceberg_delete_isolation_level = 'serializable'` instead of a table property `delete_isolation_level`), adjust the `ALTER TABLE ... SET PROPERTIES` statement accordingly.

The intent is:
1. Ensure `DELETE_ISOLATION_LEVEL` (for the Iceberg table used in this test) is set to `serializable`.
2. Run multiple mutations on the same logical rows *on the branch* so that, on the main branch, this would previously have tripped serializable delete conflict validation, but now passes on branches because `validateNoConflictingDataFiles` and delete conflict checks are skipped there.
</issue_to_address>

### Comment 4
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java:63` </location>
<code_context>
+        assertUpdate(session, format("DROP SCHEMA IF EXISTS %s", TEST_SCHEMA));
+    }
+
+    private void createTable(String tableName)
+    {
+        assertUpdate(session, "CREATE TABLE IF NOT EXISTS " + tableName + " (id BIGINT, name VARCHAR, value INTEGER) WITH (format = 'PARQUET')");
</code_context>

<issue_to_address>
**suggestion (testing):** Add unit-style tests for IcebergTableName parsing with branch-qualified names

These tests give good end-to-end coverage for branch mutations, but the new `IcebergTableName` parsing for `table.branch_<name>` is only tested indirectly. Please add focused unit tests (in the existing `IcebergTableName`/metadata tests or a new small test) that:

- Verify `IcebergTableName.from("orders.branch_audit_branch")` parses table and branch correctly with no snapshot.
- Cover allowed/forbidden combinations with table types/versions (e.g. `orders.branch_audit_branch$history`, invalid `orders.branch_audit_branch@123` if that should be rejected).
- Assert JSON round-trip preserves `branchName`.

These can use direct constructor/from-string calls, without running queries, to complement the end-to-end tests.

Suggested implementation:

```java
    @Test
    public void testInsertIntoBranch()

    @Test
    public void testBranchQualifiedTableNameParsing()
    {
        IcebergTableName tableName = IcebergTableName.from("orders.branch_audit_branch");

        assertEquals(tableName.getTableName(), "orders");
        assertTrue(tableName.getBranchName().isPresent());
        assertEquals(tableName.getBranchName().get(), "audit_branch");
        assertFalse(tableName.getSnapshotId().isPresent());
    }

    @Test
    public void testBranchQualifiedTableNameWithHistoryTableIsAllowed()
    {
        IcebergTableName tableName = IcebergTableName.from("orders.branch_audit_branch$history");

        assertEquals(tableName.getTableName(), "orders");
        assertTrue(tableName.getBranchName().isPresent());
        assertEquals(tableName.getBranchName().get(), "audit_branch");
        // history table suffix should not introduce a snapshot id
        assertFalse(tableName.getSnapshotId().isPresent());
    }

    @Test(expectedExceptions = IllegalArgumentException.class)
    public void testBranchQualifiedTableNameWithSnapshotIsRejected()
    {
        // Branch-qualified table names should not allow an additional snapshot id qualifier.
        IcebergTableName.from("orders.branch_audit_branch@123");
    }

    @Test
    public void testBranchQualifiedTableNameJsonRoundTrip()
    {
        IcebergTableName original = IcebergTableName.from("orders.branch_audit_branch");

        JsonCodec<IcebergTableName> codec = JsonCodec.jsonCodec(IcebergTableName.class);
        String json = codec.toJson(original);
        IcebergTableName roundTripped = codec.fromJson(json);

        assertEquals(roundTripped.getTableName(), "orders");
        assertEquals(roundTripped.getBranchName(), original.getBranchName());
        assertEquals(roundTripped.getSnapshotId(), original.getSnapshotId());
        assertEquals(roundTripped.getTableType(), original.getTableType());
    }

```

To integrate these tests cleanly:
1. Ensure the class has the necessary imports:
   - `import io.airlift.json.JsonCodec;`
   - `import static org.testng.Assert.assertEquals;`
   - `import static org.testng.Assert.assertFalse;`
   - `import static org.testng.Assert.assertTrue;`
2. Place the new test methods as *separate* methods at class scope (after the existing branch mutation tests and before the final closing brace of the class). The edit block above may need to be adapted so that the new tests are not inserted between the `@Test` annotation and the `testInsertIntoBranch` method signature.
3. If the `IcebergTableName` API differs (e.g., different method names or no `getTableType()`), adjust the assertions accordingly:
   - Use whatever accessors exist for table name, branch name, and snapshot id.
   - If JSON round-trip is done via a shared `JsonCodec<IcebergTableName>` instance in another test class or in this class, reuse that instead of creating a new codec locally.
4. If branch-qualified names with `$history` are not considered valid in your implementation, change `testBranchQualifiedTableNameWithHistoryTableIsAllowed` to expect an exception instead, so that it reflects the actual allowed/forbidden combinations.
</issue_to_address>

### Comment 5
<location> `presto-docs/src/main/sphinx/connector/iceberg.rst:2246` </location>
<code_context>
+Iceberg supports performing INSERT, UPDATE, and DELETE operations directly on branches,
+allowing you to make changes to a branch without affecting the main table or other branches.
+
+To perform mutations on a branch, use the quoted identifier syntax ``"table.branch_branchname"``.
+The quotes are required to prevent the SQL parser from interpreting the dot as a schema.table separator.
+
</code_context>

<issue_to_address>
**suggestion (typo):** Clarify or correct the placeholder identifier `"table.branch_branchname"`.

The placeholder name is ambiguous and looks like it has a duplicated `branch_`. To better convey the pattern and align with examples like `"orders.branch_audit_branch"`, consider something like `"table.branch_<branch_name>"` or `"table.branch_branch_name"`.

```suggestion
To perform mutations on a branch, use the quoted identifier syntax ``"table.branch_<branch_name>"`` (for example, ``"orders.branch_audit_branch"``).
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

}

@Test
public void testInsertIntoBranch()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add negative tests for mutations on non-existent or non-branch refs

The new logic validates that the qualified branch name (e.g., foo.branch_audit_branch) exists and throws a NOT_FOUND PrestoException otherwise. Please add negative tests (using assertQueryFails) that cover these error paths, e.g.:

  • INSERT INTO "test_insert_branch.branch_non_existing" ... fails with NOT_FOUND and mentions the branch and table.
  • UPDATE "test_insert_branch.branch_<tag_name>" ... where <tag_name> is a tag (non-branch ref) also fails with NOT_FOUND.

This will ensure the error behavior and messages for invalid refs are locked in for insert/update/delete paths.

Suggested implementation:

    @Test
    public void testInsertIntoBranch()
    {
        String tableName = "test_insert_branch";
        createTable(tableName);
        try {
            assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'audit_branch'");
            assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_branch' AND type = 'BRANCH'", "VALUES 1");
            assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 2");
            // Insert into branch
            assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_audit_branch\" VALUES (3, 'Charlie', 300), (4, 'David', 400)", 2);
            assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 4");
            // Verify main table still has only original data
            assertQuery(session, "SELECT count(*) FROM " + tableName, "VALUES 2");

            // Negative tests: non-existent branch ref
            assertQueryFails(
                    session,
                    "INSERT INTO \"" + tableName + ".branch_non_existing\" VALUES (5, 'Eve', 500)",
                    ".*branch_non_existing.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "UPDATE \"" + tableName + ".branch_non_existing\" SET value = 999 WHERE id = 1",
                    ".*branch_non_existing.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "DELETE FROM \"" + tableName + ".branch_non_existing\" WHERE id = 1",
                    ".*branch_non_existing.*" + tableName + ".*");

            // Create a tag that points to the branch; mutations against the tag ref should also fail
            assertUpdate(session, "ALTER TABLE " + tableName + " CREATE TAG 'audit_tag' AS 'audit_branch'");
            assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_tag' AND type = 'TAG'", "VALUES 1");
            assertQueryFails(
                    session,
                    "INSERT INTO \"" + tableName + ".branch_audit_tag\" VALUES (6, 'Frank', 600)",
                    ".*branch_audit_tag.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "UPDATE \"" + tableName + ".branch_audit_tag\" SET value = 1000 WHERE id = 1",
                    ".*branch_audit_tag.*" + tableName + ".*");
            assertQueryFails(
                    session,
                    "DELETE FROM \"" + tableName + ".branch_audit_tag\" WHERE id = 1",
                    ".*branch_audit_tag.*" + tableName + ".*");
  1. These changes assume assertQueryFails(Session, String, String) is already available in this test base class (as in other Presto tests). If the overload is different (e.g., no session argument), adjust the calls accordingly.
  2. The regex expectations are intentionally loose and only assert that the failing error message mentions both the ref name and the table (as requested). If your implementation also includes NOT_FOUND or a more specific prefix, you may want to tighten the patterns (e.g., "(?s).*NOT_FOUND.*branch_non_existing.*" + tableName + ".*").

}

@Test
public void testMultipleMutationsOnBranch()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Cover the branch-specific conflict validation behaviour (skipped validation on branches)

This test does multiple mutations on a branch, but it doesn’t clearly cover a case that would previously have failed conflict validation (e.g., under SERIALIZABLE delete isolation).

Please either extend this test or add a new one that:

  • Configures DELETE_ISOLATION_LEVEL = 'serializable' (table or session), and
  • Executes a sequence of mutations on the same rows on a branch that would previously have triggered conflict validation, asserting they now succeed.

That will directly verify the behaviour change behind skipping validateNoConflictingDataFiles and delete conflict checks on branches.

Suggested implementation:

    @Test
    public void testMultipleMutationsOnBranch()
    {
        String tableName = "test_multiple_mutations_branch";
        createTable(tableName);
        try {
            // Configure delete isolation to serializable so that this sequence of mutations
            // would be subject to conflict validation on the main branch
            assertUpdate(session, "ALTER TABLE " + tableName + " SET PROPERTIES delete_isolation_level = 'serializable'");

            assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'multi_branch'");

            // Perform multiple conflicting operations on the same logical rows on the branch.
            // This used to trigger conflict validation (e.g., validateNoConflictingDataFiles and
            // serializable delete checks) but should now succeed on branches.
            assertUpdate(session, "UPDATE \"" + tableName + ".branch_multi_branch\" SET value = 150 WHERE id = 1", 1);
            assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 1", 1);
            assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (1, 'Alice', 200)", 1);
            assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 2", 1);
            assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (3, 'Charlie', 300)", 1);

            // Verify final state in branch: row 1 has been rewritten, row 2 deleted, row 3 inserted
            assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch'", "VALUES 2");
            assertQuery(
                    session,
                    "SELECT id, name, value FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch' ORDER BY id",
                    "VALUES (1, 'Alice', 200), (3, 'Charlie', 300)");

If the Iceberg connector in this codebase uses a different mechanism for configuring delete isolation (for example, a session property like SET SESSION iceberg_delete_isolation_level = 'serializable' instead of a table property delete_isolation_level), adjust the ALTER TABLE ... SET PROPERTIES statement accordingly.

The intent is:

  1. Ensure DELETE_ISOLATION_LEVEL (for the Iceberg table used in this test) is set to serializable.
  2. Run multiple mutations on the same logical rows on the branch so that, on the main branch, this would previously have tripped serializable delete conflict validation, but now passes on branches because validateNoConflictingDataFiles and delete conflict checks are skipped there.

@agrawalreetika agrawalreetika force-pushed the iceberg-branch-mutation branch 2 times, most recently from c200794 to 73f305a Compare February 16, 2026 10:26
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the documentation! As I mentioned in a comment, including comments in the code blocks explaining what the SQL is doing is great - really helpful for the reader.

Just a few nits.

@agrawalreetika agrawalreetika force-pushed the iceberg-branch-mutation branch 2 times, most recently from 251d81a to 6c3a621 Compare February 27, 2026 08:02
steveburnett
steveburnett previously approved these changes Feb 27, 2026
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@agrawalreetika agrawalreetika force-pushed the iceberg-branch-mutation branch from 6c3a621 to 2b1814e Compare March 1, 2026 03:07
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this feature. I'm wondering if we should also consider the behavior of test_table.branch_audit_branch in other operations besides insert/update/delete/merge? For example, the following queries would be confusing since we ignored the branch information in the table names:

select * from "test_table.branch_audit_branch";
select * from "test_table.branch_b1" for system_version as of 'branch_b2';

@agrawalreetika
Copy link
Copy Markdown
Member Author

Thanks for this feature. I'm wondering if we should also consider the behavior of test_table.branch_audit_branch in other operations besides insert/update/delete/merge? For example, the following queries would be confusing since we ignored the branch information in the table names:

select * from "test_table.branch_audit_branch";
select * from "test_table.branch_b1" for system_version as of 'branch_b2';

Thank you for your review @hantangwangd

  1. Could you please help me point out which other mutation operation we should cover as part of this? Right now, I covered the ones that are mentioned in the issue - Add support for mutating an Iceberg branch #22030 (comment)

  2. About the SELECT operation, currently from Presto we have only this syntax available for querying branch/tag - https://prestodb.io/docs/current/connector/iceberg.html#querying-branches-and-tags, but I think Spark supports this syntax as well - SELECT * FROM db.table.branch_test_branch. I think, as part of the extension, we can extend the querying behaviour in Presto as well for a better user experience and avoid confusion. I think we could extend that as part of a subsequent PR, WDYT?

@hantangwangd
Copy link
Copy Markdown
Member

Hi @agrawalreetika,

I think Spark supports this syntax as well - SELECT * FROM db.table.branch_test_branch. I think, as part of the extension, we can extend the querying behaviour in Presto as well for a better user experience and avoid confusion.

I agree, that sounds like a reasonable solution.

As for other operations, what I can think of so far is:

  • truncate/metadata delete: ignoring the branch specified in the table name in such statements seems to be clearly incorrect behavior. A straightforward example: when we execute delete from "test_table.branch_test_branch", the data on the main branch get deleted directly.
  • create view/materialized view: create view/mv as select * from "test_table.branch_test_branch" may also cause some issues. While the branch information is preserved in show create, the actual data is inconsistent. It seems we need to either make the specified branch take effect or directly disallow this type of definition in create view/mv statements.
  • DDL: for some types of DDL statements, it seems that specifying a table's branch doesn't make sense. Is it possible to simply disallow such usage?

Considering all of the above, I'm wondering is it possible to disallow this kind of definition by default, and only support it where explicitly needed? What's your thoughts about this?

@agrawalreetika
Copy link
Copy Markdown
Member Author

agrawalreetika commented Mar 3, 2026

As for other operations, what I can think of so far is:

  • truncate/metadata delete: ignoring the branch specified in the table name in such statements seems to be clearly incorrect behavior. A straightforward example: when we execute delete from "test_table.branch_test_branch", the data on the main branch get deleted directly.
  • create view/materialized view: create view/mv as select * from "test_table.branch_test_branch" may also cause some issues. While the branch information is preserved in show create, the actual data is inconsistent. It seems we need to either make the specified branch take effect or directly disallow this type of definition in create view/mv statements.
  • DDL: for some types of DDL statements, it seems that specifying a table's branch doesn't make sense. Is it possible to simply disallow such usage?

Considering all of the above, I'm wondering is it possible to disallow this kind of definition by default, and only support it where explicitly needed? What's your thoughts about this?

Thanks for the details @hantangwangd
I agree about table.branch_<branch_name> syntax being too permissive. I have added validation on different operations for rejection. Please take a look.

@agrawalreetika agrawalreetika force-pushed the iceberg-branch-mutation branch from 4a9932c to dd2a059 Compare March 3, 2026 20:11
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but overall looks good to me. Thank you for implementing this feature.

It's interesting that multiple different branches can evolve independently within a single Iceberg table transaction - in some scenarios, this could even serve as a kind of constrained, lightweight version of multi-table transactions. After this PR is merged, I will add some test cases to verify this functionality.

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, otherwise looks good to me.

The test failure seems relevant.

@agrawalreetika agrawalreetika force-pushed the iceberg-branch-mutation branch from 1bb83a3 to 3fb476e Compare March 4, 2026 15:11
@agrawalreetika agrawalreetika force-pushed the iceberg-branch-mutation branch from 3fb476e to 4312f36 Compare March 4, 2026 17:58
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agrawalreetika, lgtm!

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@agrawalreetika agrawalreetika merged commit e75f5a5 into prestodb:master Mar 5, 2026
125 of 128 checks passed
@agrawalreetika agrawalreetika deleted the iceberg-branch-mutation branch March 5, 2026 14:13
garimauttam pushed a commit to garimauttam/presto that referenced this pull request Mar 9, 2026
…stodb#27147)

## Description
Add support for mutating an Iceberg branch

## Motivation and Context
Resolves prestodb#22030

## Impact
Resolves prestodb#22030

Add support for mutating an Iceberg branch based on the syntax disucssed
[here](prestodb#22030 (comment))

```
INSERT INTO "orders.branch_audit_branch" VALUES (1, 'Product A', 100.00);

UPDATE "orders.branch_audit_branch" SET price = 120.00 WHERE id = 1;

DELETE FROM "orders.branch_audit_branch" WHERE id = 2;
``` 

## Test Plan
Added

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Summary by Sourcery

Add branch-aware mutation support to the Iceberg connector, enabling
INSERT/UPDATE/DELETE operations on specific Iceberg branches via
extended table naming.

New Features:
- Support addressing Iceberg branches in table names using a
branch-qualified naming pattern for mutations.
- Allow INSERT operations to append data to a specific Iceberg branch
instead of the main table.
- Allow UPDATE and DELETE operations to modify data on a specific
Iceberg branch with appropriate branch routing and validation.

Enhancements:
- Improve error messaging and validation when committing Iceberg updates
by including branch context and verifying that referenced branches
exist.

Tests:
- Add TestIcebergBranchMutations covering inserts, updates, deletes,
multi-step mutations, complex predicates, branch isolation, and INSERT
... SELECT into Iceberg branches.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for mutating an Iceberg branch
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for mutating an Iceberg branch

4 participants