feat(plugin-iceberg): Add support for mutating an Iceberg branch#27147
feat(plugin-iceberg): Add support for mutating an Iceberg branch#27147agrawalreetika merged 2 commits intoprestodb:masterfrom
Conversation
Reviewer's GuideAdds Iceberg branch-aware mutation support (INSERT/UPDATE/DELETE) by extending table name parsing to include branch identifiers, routing write/row-delta operations to specific branches, adjusting conflict validation for branch operations, and introducing comprehensive tests for branch mutations. Sequence diagram for branch-aware Iceberg INSERT/UPDATE/DELETE mutationssequenceDiagram
actor User
participant PrestoPlanner
participant IcebergAbstractMetadata
participant IcebergTable
participant Transaction
participant AppendFiles
participant RowDelta
User->>PrestoPlanner: SQL INSERT/UPDATE/DELETE on orders.branch_audit_branch
PrestoPlanner->>IcebergAbstractMetadata: getTableHandle(schemaTableName)
IcebergAbstractMetadata->>IcebergTableName: from(name)
IcebergTableName-->>IcebergAbstractMetadata: tableName, snapshotId, branchName
PrestoPlanner->>IcebergAbstractMetadata: beginIcebergTableInsert / beginUpdate / beginDelete
IcebergAbstractMetadata->>IcebergTable: getIcebergTable(schemaTableName)
alt branchName present
IcebergAbstractMetadata->>IcebergTable: refs().get(branch)
IcebergTable-->>IcebergAbstractMetadata: SnapshotRef
IcebergAbstractMetadata-->>IcebergAbstractMetadata: validate branchRef.isBranch()
IcebergAbstractMetadata->>IcebergTable: newTransaction()
IcebergTable-->>IcebergAbstractMetadata: Transaction
opt INSERT path
IcebergAbstractMetadata->>Transaction: newAppend()
Transaction-->>IcebergAbstractMetadata: AppendFiles
IcebergAbstractMetadata->>AppendFiles: toBranch(branchName)
end
opt UPDATE/DELETE path
IcebergAbstractMetadata->>Transaction: newRowDelta()
Transaction-->>IcebergAbstractMetadata: RowDelta
IcebergAbstractMetadata->>RowDelta: toBranch(branchName)
IcebergAbstractMetadata-->>IcebergAbstractMetadata: skip conflict validation
end
IcebergAbstractMetadata->>Transaction: commitTransaction()
else no branchName
IcebergAbstractMetadata->>IcebergTable: newTransaction()
IcebergTable-->>IcebergAbstractMetadata: Transaction
opt INSERT path
IcebergAbstractMetadata->>Transaction: newAppend()
Transaction-->>IcebergAbstractMetadata: AppendFiles
end
opt UPDATE/DELETE path
IcebergAbstractMetadata->>Transaction: newRowDelta()
Transaction-->>IcebergAbstractMetadata: RowDelta
IcebergAbstractMetadata-->>IcebergAbstractMetadata: perform full conflict validation
end
IcebergAbstractMetadata->>Transaction: commitTransaction()
end
Transaction-->>PrestoPlanner: commit result / error
PrestoPlanner-->>User: query success or PrestoException with branch in message
Updated class diagram for IcebergTableName and branch-aware metadata operationsclassDiagram
class IcebergTableName {
-String tableName
-IcebergTableType icebergTableType
-Optional~Long~ snapshotId
-Optional~String~ branchName
-Optional~Long~ changelogEndSnapshot
+IcebergTableName(tableName:String, icebergTableType:IcebergTableType, snapshotId:Optional~Long~, branchName:Optional~String~, changelogEndSnapshot:Optional~Long~)
+String getTableName()
+IcebergTableType getTableType()
+Optional~Long~ getSnapshotId()
+Optional~String~ getBranchName()
+Optional~Long~ getChangelogEndSnapshot()
+static IcebergTableName from(name:String)
}
class IcebergTableHandle {
-String schemaName
-IcebergTableName icebergTableName
+IcebergTableName getIcebergTableName()
+String getSchemaName()
}
class IcebergAbstractMetadata {
-Transaction transaction
+ConnectorInsertTableHandle beginIcebergTableInsert(session:ConnectorSession, table:IcebergTableHandle, icebergTable:Table)
+Optional~ConnectorOutputMetadata~ finishInsert(session:ConnectorSession, insertHandle:IcebergInsertTableHandle, fragments:Collection~Slice~)
+Optional~ConnectorOutputMetadata~ finishWrite(session:ConnectorSession, writableTableHandle:IcebergWritableTableHandle, fragments:Collection~Slice~, operationType:OperationType)
+ConnectorDeleteTableHandle beginDelete(session:ConnectorSession, tableHandle:ConnectorTableHandle)
+Optional~ConnectorOutputMetadata~ finishDeleteWithOutput(session:ConnectorSession, handle:IcebergDeleteTableHandle, fragments:Collection~Slice~)
+ConnectorTableHandle beginUpdate(session:ConnectorSession, tableHandle:ConnectorTableHandle, updatedColumns:List~ColumnHandle~)
}
class IcebergOutputTableHandle {
-String schemaName
-IcebergTableName tableName
+IcebergTableName getTableName()
}
class IcebergEqualityDeleteAsJoin {
+TableScanNode createDeletesTableScan(mapping:ImmutableMap~VariableReferenceExpression,ColumnHandle~, icebergTableHandle:IcebergTableHandle)
+TableScanNode createNewRoot(node:TableScanNode, icebergTableHandle:IcebergTableHandle)
}
IcebergTableHandle --> IcebergTableName : uses
IcebergOutputTableHandle --> IcebergTableName : uses
IcebergAbstractMetadata --> IcebergTableHandle : reads branchName via getIcebergTableName()
IcebergEqualityDeleteAsJoin --> IcebergTableHandle : reads branch-aware IcebergTableName
IcebergEqualityDeleteAsJoin --> IcebergTableName : constructs new instances with branchName
IcebergAbstractMetadata ..> IcebergTableName : parses branch-qualified table names via from()
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- In
finishWrite, the previously presentvalidateFromSnapshotcall onRowDeltawas removed entirely; if this validation is still required for non-branch writes, consider reintroducing it (e.g., outside or in parallel to the new branch handling) to avoid changing snapshot semantics unintentionally. - Branch existence validation logic (
refs().get(branch)+ type check) is duplicated in multiple places (beginIcebergTableInsert,beginDelete,beginUpdate); consider extracting a small helper to centralize this check and keep future behavior changes consistent. - In
finishWrite, all conflict validation (validateNoConflictingDataFiles,validateDeletedFiles,validateNoConflictingDeleteFiles) is skipped for any branch write; if only certain branch scenarios need relaxed constraints (e.g., specific sequential mutations), consider narrowing the condition so branch operations still get as much correctness validation as possible.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `finishWrite`, the previously present `validateFromSnapshot` call on `RowDelta` was removed entirely; if this validation is still required for non-branch writes, consider reintroducing it (e.g., outside or in parallel to the new branch handling) to avoid changing snapshot semantics unintentionally.
- Branch existence validation logic (`refs().get(branch)` + type check) is duplicated in multiple places (`beginIcebergTableInsert`, `beginDelete`, `beginUpdate`); consider extracting a small helper to centralize this check and keep future behavior changes consistent.
- In `finishWrite`, all conflict validation (`validateNoConflictingDataFiles`, `validateDeletedFiles`, `validateNoConflictingDeleteFiles`) is skipped for any branch write; if only certain branch scenarios need relaxed constraints (e.g., specific sequential mutations), consider narrowing the condition so branch operations still get as much correctness validation as possible.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergTableName.java:45-46` </location>
<code_context>
{
private static final Pattern TABLE_PATTERN = Pattern.compile("" +
- "(?<table>[^$@]+)" +
+ "(?<table>[^$@.]+)" +
+ "(?:\\.branch_(?<branch>[^$@]+))?" +
"(?:@(?<ver1>[0-9]+))?" +
"(?:\\$(?<type>[^@]+)(?:@(?<ver2>[0-9]+))?)?");
</code_context>
<issue_to_address>
**issue (bug_risk):** New table name pattern forbids dots and reserves `.branch_` in the table segment, which changes parsing semantics.
The updated pattern now rejects dots in `table` and treats `.branch_...` as a special suffix, so existing table names containing `.` or `.branch_` will be parsed differently or not match at all. If those names exist in your deployment, this is a breaking change. To reserve `.branch_` more safely, consider keeping the original `table` pattern and adding explicit validation/error handling when encountering ambiguous names.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java:75` </location>
<code_context>
+ }
+
+ @Test
+ public void testInsertIntoBranch()
+ {
+ String tableName = "test_insert_branch";
</code_context>
<issue_to_address>
**suggestion (testing):** Add negative tests for mutations on non-existent or non-branch refs
The new logic validates that the qualified branch name (e.g., `foo.branch_audit_branch`) exists and throws a `NOT_FOUND` `PrestoException` otherwise. Please add negative tests (using `assertQueryFails`) that cover these error paths, e.g.:
- `INSERT INTO "test_insert_branch.branch_non_existing" ...` fails with `NOT_FOUND` and mentions the branch and table.
- `UPDATE "test_insert_branch.branch_<tag_name>" ...` where `<tag_name>` is a tag (non-branch ref) also fails with `NOT_FOUND`.
This will ensure the error behavior and messages for invalid refs are locked in for insert/update/delete paths.
Suggested implementation:
```java
@Test
public void testInsertIntoBranch()
{
String tableName = "test_insert_branch";
createTable(tableName);
try {
assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'audit_branch'");
assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_branch' AND type = 'BRANCH'", "VALUES 1");
assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 2");
// Insert into branch
assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_audit_branch\" VALUES (3, 'Charlie', 300), (4, 'David', 400)", 2);
assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 4");
// Verify main table still has only original data
assertQuery(session, "SELECT count(*) FROM " + tableName, "VALUES 2");
// Negative tests: non-existent branch ref
assertQueryFails(
session,
"INSERT INTO \"" + tableName + ".branch_non_existing\" VALUES (5, 'Eve', 500)",
".*branch_non_existing.*" + tableName + ".*");
assertQueryFails(
session,
"UPDATE \"" + tableName + ".branch_non_existing\" SET value = 999 WHERE id = 1",
".*branch_non_existing.*" + tableName + ".*");
assertQueryFails(
session,
"DELETE FROM \"" + tableName + ".branch_non_existing\" WHERE id = 1",
".*branch_non_existing.*" + tableName + ".*");
// Create a tag that points to the branch; mutations against the tag ref should also fail
assertUpdate(session, "ALTER TABLE " + tableName + " CREATE TAG 'audit_tag' AS 'audit_branch'");
assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_tag' AND type = 'TAG'", "VALUES 1");
assertQueryFails(
session,
"INSERT INTO \"" + tableName + ".branch_audit_tag\" VALUES (6, 'Frank', 600)",
".*branch_audit_tag.*" + tableName + ".*");
assertQueryFails(
session,
"UPDATE \"" + tableName + ".branch_audit_tag\" SET value = 1000 WHERE id = 1",
".*branch_audit_tag.*" + tableName + ".*");
assertQueryFails(
session,
"DELETE FROM \"" + tableName + ".branch_audit_tag\" WHERE id = 1",
".*branch_audit_tag.*" + tableName + ".*");
```
1. These changes assume `assertQueryFails(Session, String, String)` is already available in this test base class (as in other Presto tests). If the overload is different (e.g., no `session` argument), adjust the calls accordingly.
2. The regex expectations are intentionally loose and only assert that the failing error message mentions both the ref name and the table (as requested). If your implementation also includes `NOT_FOUND` or a more specific prefix, you may want to tighten the patterns (e.g., `"(?s).*NOT_FOUND.*branch_non_existing.*" + tableName + ".*"`).
</issue_to_address>
### Comment 3
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java:136` </location>
<code_context>
+ }
+
+ @Test
+ public void testMultipleMutationsOnBranch()
+ {
+ String tableName = "test_multiple_mutations_branch";
</code_context>
<issue_to_address>
**suggestion (testing):** Cover the branch-specific conflict validation behaviour (skipped validation on branches)
This test does multiple mutations on a branch, but it doesn’t clearly cover a case that would previously have failed conflict validation (e.g., under SERIALIZABLE delete isolation).
Please either extend this test or add a new one that:
- Configures `DELETE_ISOLATION_LEVEL = 'serializable'` (table or session), and
- Executes a sequence of mutations on the same rows on a branch that would previously have triggered conflict validation, asserting they now succeed.
That will directly verify the behaviour change behind skipping `validateNoConflictingDataFiles` and delete conflict checks on branches.
Suggested implementation:
```java
@Test
public void testMultipleMutationsOnBranch()
{
String tableName = "test_multiple_mutations_branch";
createTable(tableName);
try {
// Configure delete isolation to serializable so that this sequence of mutations
// would be subject to conflict validation on the main branch
assertUpdate(session, "ALTER TABLE " + tableName + " SET PROPERTIES delete_isolation_level = 'serializable'");
assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'multi_branch'");
// Perform multiple conflicting operations on the same logical rows on the branch.
// This used to trigger conflict validation (e.g., validateNoConflictingDataFiles and
// serializable delete checks) but should now succeed on branches.
assertUpdate(session, "UPDATE \"" + tableName + ".branch_multi_branch\" SET value = 150 WHERE id = 1", 1);
assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 1", 1);
assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (1, 'Alice', 200)", 1);
assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 2", 1);
assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (3, 'Charlie', 300)", 1);
// Verify final state in branch: row 1 has been rewritten, row 2 deleted, row 3 inserted
assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch'", "VALUES 2");
assertQuery(
session,
"SELECT id, name, value FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch' ORDER BY id",
"VALUES (1, 'Alice', 200), (3, 'Charlie', 300)");
```
If the Iceberg connector in this codebase uses a different mechanism for configuring delete isolation (for example, a session property like `SET SESSION iceberg_delete_isolation_level = 'serializable'` instead of a table property `delete_isolation_level`), adjust the `ALTER TABLE ... SET PROPERTIES` statement accordingly.
The intent is:
1. Ensure `DELETE_ISOLATION_LEVEL` (for the Iceberg table used in this test) is set to `serializable`.
2. Run multiple mutations on the same logical rows *on the branch* so that, on the main branch, this would previously have tripped serializable delete conflict validation, but now passes on branches because `validateNoConflictingDataFiles` and delete conflict checks are skipped there.
</issue_to_address>
### Comment 4
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java:63` </location>
<code_context>
+ assertUpdate(session, format("DROP SCHEMA IF EXISTS %s", TEST_SCHEMA));
+ }
+
+ private void createTable(String tableName)
+ {
+ assertUpdate(session, "CREATE TABLE IF NOT EXISTS " + tableName + " (id BIGINT, name VARCHAR, value INTEGER) WITH (format = 'PARQUET')");
</code_context>
<issue_to_address>
**suggestion (testing):** Add unit-style tests for IcebergTableName parsing with branch-qualified names
These tests give good end-to-end coverage for branch mutations, but the new `IcebergTableName` parsing for `table.branch_<name>` is only tested indirectly. Please add focused unit tests (in the existing `IcebergTableName`/metadata tests or a new small test) that:
- Verify `IcebergTableName.from("orders.branch_audit_branch")` parses table and branch correctly with no snapshot.
- Cover allowed/forbidden combinations with table types/versions (e.g. `orders.branch_audit_branch$history`, invalid `orders.branch_audit_branch@123` if that should be rejected).
- Assert JSON round-trip preserves `branchName`.
These can use direct constructor/from-string calls, without running queries, to complement the end-to-end tests.
Suggested implementation:
```java
@Test
public void testInsertIntoBranch()
@Test
public void testBranchQualifiedTableNameParsing()
{
IcebergTableName tableName = IcebergTableName.from("orders.branch_audit_branch");
assertEquals(tableName.getTableName(), "orders");
assertTrue(tableName.getBranchName().isPresent());
assertEquals(tableName.getBranchName().get(), "audit_branch");
assertFalse(tableName.getSnapshotId().isPresent());
}
@Test
public void testBranchQualifiedTableNameWithHistoryTableIsAllowed()
{
IcebergTableName tableName = IcebergTableName.from("orders.branch_audit_branch$history");
assertEquals(tableName.getTableName(), "orders");
assertTrue(tableName.getBranchName().isPresent());
assertEquals(tableName.getBranchName().get(), "audit_branch");
// history table suffix should not introduce a snapshot id
assertFalse(tableName.getSnapshotId().isPresent());
}
@Test(expectedExceptions = IllegalArgumentException.class)
public void testBranchQualifiedTableNameWithSnapshotIsRejected()
{
// Branch-qualified table names should not allow an additional snapshot id qualifier.
IcebergTableName.from("orders.branch_audit_branch@123");
}
@Test
public void testBranchQualifiedTableNameJsonRoundTrip()
{
IcebergTableName original = IcebergTableName.from("orders.branch_audit_branch");
JsonCodec<IcebergTableName> codec = JsonCodec.jsonCodec(IcebergTableName.class);
String json = codec.toJson(original);
IcebergTableName roundTripped = codec.fromJson(json);
assertEquals(roundTripped.getTableName(), "orders");
assertEquals(roundTripped.getBranchName(), original.getBranchName());
assertEquals(roundTripped.getSnapshotId(), original.getSnapshotId());
assertEquals(roundTripped.getTableType(), original.getTableType());
}
```
To integrate these tests cleanly:
1. Ensure the class has the necessary imports:
- `import io.airlift.json.JsonCodec;`
- `import static org.testng.Assert.assertEquals;`
- `import static org.testng.Assert.assertFalse;`
- `import static org.testng.Assert.assertTrue;`
2. Place the new test methods as *separate* methods at class scope (after the existing branch mutation tests and before the final closing brace of the class). The edit block above may need to be adapted so that the new tests are not inserted between the `@Test` annotation and the `testInsertIntoBranch` method signature.
3. If the `IcebergTableName` API differs (e.g., different method names or no `getTableType()`), adjust the assertions accordingly:
- Use whatever accessors exist for table name, branch name, and snapshot id.
- If JSON round-trip is done via a shared `JsonCodec<IcebergTableName>` instance in another test class or in this class, reuse that instead of creating a new codec locally.
4. If branch-qualified names with `$history` are not considered valid in your implementation, change `testBranchQualifiedTableNameWithHistoryTableIsAllowed` to expect an exception instead, so that it reflects the actual allowed/forbidden combinations.
</issue_to_address>
### Comment 5
<location> `presto-docs/src/main/sphinx/connector/iceberg.rst:2246` </location>
<code_context>
+Iceberg supports performing INSERT, UPDATE, and DELETE operations directly on branches,
+allowing you to make changes to a branch without affecting the main table or other branches.
+
+To perform mutations on a branch, use the quoted identifier syntax ``"table.branch_branchname"``.
+The quotes are required to prevent the SQL parser from interpreting the dot as a schema.table separator.
+
</code_context>
<issue_to_address>
**suggestion (typo):** Clarify or correct the placeholder identifier `"table.branch_branchname"`.
The placeholder name is ambiguous and looks like it has a duplicated `branch_`. To better convey the pattern and align with examples like `"orders.branch_audit_branch"`, consider something like `"table.branch_<branch_name>"` or `"table.branch_branch_name"`.
```suggestion
To perform mutations on a branch, use the quoted identifier syntax ``"table.branch_<branch_name>"`` (for example, ``"orders.branch_audit_branch"``).
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergTableName.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testInsertIntoBranch() |
There was a problem hiding this comment.
suggestion (testing): Add negative tests for mutations on non-existent or non-branch refs
The new logic validates that the qualified branch name (e.g., foo.branch_audit_branch) exists and throws a NOT_FOUND PrestoException otherwise. Please add negative tests (using assertQueryFails) that cover these error paths, e.g.:
INSERT INTO "test_insert_branch.branch_non_existing" ...fails withNOT_FOUNDand mentions the branch and table.UPDATE "test_insert_branch.branch_<tag_name>" ...where<tag_name>is a tag (non-branch ref) also fails withNOT_FOUND.
This will ensure the error behavior and messages for invalid refs are locked in for insert/update/delete paths.
Suggested implementation:
@Test
public void testInsertIntoBranch()
{
String tableName = "test_insert_branch";
createTable(tableName);
try {
assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'audit_branch'");
assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_branch' AND type = 'BRANCH'", "VALUES 1");
assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 2");
// Insert into branch
assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_audit_branch\" VALUES (3, 'Charlie', 300), (4, 'David', 400)", 2);
assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'audit_branch'", "VALUES 4");
// Verify main table still has only original data
assertQuery(session, "SELECT count(*) FROM " + tableName, "VALUES 2");
// Negative tests: non-existent branch ref
assertQueryFails(
session,
"INSERT INTO \"" + tableName + ".branch_non_existing\" VALUES (5, 'Eve', 500)",
".*branch_non_existing.*" + tableName + ".*");
assertQueryFails(
session,
"UPDATE \"" + tableName + ".branch_non_existing\" SET value = 999 WHERE id = 1",
".*branch_non_existing.*" + tableName + ".*");
assertQueryFails(
session,
"DELETE FROM \"" + tableName + ".branch_non_existing\" WHERE id = 1",
".*branch_non_existing.*" + tableName + ".*");
// Create a tag that points to the branch; mutations against the tag ref should also fail
assertUpdate(session, "ALTER TABLE " + tableName + " CREATE TAG 'audit_tag' AS 'audit_branch'");
assertQuery(session, "SELECT count(*) FROM \"" + tableName + "$refs\" WHERE name = 'audit_tag' AND type = 'TAG'", "VALUES 1");
assertQueryFails(
session,
"INSERT INTO \"" + tableName + ".branch_audit_tag\" VALUES (6, 'Frank', 600)",
".*branch_audit_tag.*" + tableName + ".*");
assertQueryFails(
session,
"UPDATE \"" + tableName + ".branch_audit_tag\" SET value = 1000 WHERE id = 1",
".*branch_audit_tag.*" + tableName + ".*");
assertQueryFails(
session,
"DELETE FROM \"" + tableName + ".branch_audit_tag\" WHERE id = 1",
".*branch_audit_tag.*" + tableName + ".*");- These changes assume
assertQueryFails(Session, String, String)is already available in this test base class (as in other Presto tests). If the overload is different (e.g., nosessionargument), adjust the calls accordingly. - The regex expectations are intentionally loose and only assert that the failing error message mentions both the ref name and the table (as requested). If your implementation also includes
NOT_FOUNDor a more specific prefix, you may want to tighten the patterns (e.g.,"(?s).*NOT_FOUND.*branch_non_existing.*" + tableName + ".*").
| } | ||
|
|
||
| @Test | ||
| public void testMultipleMutationsOnBranch() |
There was a problem hiding this comment.
suggestion (testing): Cover the branch-specific conflict validation behaviour (skipped validation on branches)
This test does multiple mutations on a branch, but it doesn’t clearly cover a case that would previously have failed conflict validation (e.g., under SERIALIZABLE delete isolation).
Please either extend this test or add a new one that:
- Configures
DELETE_ISOLATION_LEVEL = 'serializable'(table or session), and - Executes a sequence of mutations on the same rows on a branch that would previously have triggered conflict validation, asserting they now succeed.
That will directly verify the behaviour change behind skipping validateNoConflictingDataFiles and delete conflict checks on branches.
Suggested implementation:
@Test
public void testMultipleMutationsOnBranch()
{
String tableName = "test_multiple_mutations_branch";
createTable(tableName);
try {
// Configure delete isolation to serializable so that this sequence of mutations
// would be subject to conflict validation on the main branch
assertUpdate(session, "ALTER TABLE " + tableName + " SET PROPERTIES delete_isolation_level = 'serializable'");
assertUpdate(session, "ALTER TABLE " + tableName + " CREATE BRANCH 'multi_branch'");
// Perform multiple conflicting operations on the same logical rows on the branch.
// This used to trigger conflict validation (e.g., validateNoConflictingDataFiles and
// serializable delete checks) but should now succeed on branches.
assertUpdate(session, "UPDATE \"" + tableName + ".branch_multi_branch\" SET value = 150 WHERE id = 1", 1);
assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 1", 1);
assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (1, 'Alice', 200)", 1);
assertUpdate(session, "DELETE FROM \"" + tableName + ".branch_multi_branch\" WHERE id = 2", 1);
assertUpdate(session, "INSERT INTO \"" + tableName + ".branch_multi_branch\" VALUES (3, 'Charlie', 300)", 1);
// Verify final state in branch: row 1 has been rewritten, row 2 deleted, row 3 inserted
assertQuery(session, "SELECT count(*) FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch'", "VALUES 2");
assertQuery(
session,
"SELECT id, name, value FROM " + tableName + " FOR SYSTEM_VERSION AS OF 'multi_branch' ORDER BY id",
"VALUES (1, 'Alice', 200), (3, 'Charlie', 300)");If the Iceberg connector in this codebase uses a different mechanism for configuring delete isolation (for example, a session property like SET SESSION iceberg_delete_isolation_level = 'serializable' instead of a table property delete_isolation_level), adjust the ALTER TABLE ... SET PROPERTIES statement accordingly.
The intent is:
- Ensure
DELETE_ISOLATION_LEVEL(for the Iceberg table used in this test) is set toserializable. - Run multiple mutations on the same logical rows on the branch so that, on the main branch, this would previously have tripped serializable delete conflict validation, but now passes on branches because
validateNoConflictingDataFilesand delete conflict checks are skipped there.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java
Show resolved
Hide resolved
c200794 to
73f305a
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Thank you for the documentation! As I mentioned in a comment, including comments in the code blocks explaining what the SQL is doing is great - really helpful for the reader.
Just a few nits.
251d81a to
6c3a621
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
6c3a621 to
2b1814e
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for this feature. I'm wondering if we should also consider the behavior of test_table.branch_audit_branch in other operations besides insert/update/delete/merge? For example, the following queries would be confusing since we ignored the branch information in the table names:
select * from "test_table.branch_audit_branch";
select * from "test_table.branch_b1" for system_version as of 'branch_b2';
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergBranchMutations.java
Show resolved
Hide resolved
Thank you for your review @hantangwangd
|
|
Hi @agrawalreetika,
I agree, that sounds like a reasonable solution. As for other operations, what I can think of so far is:
Considering all of the above, I'm wondering is it possible to disallow this kind of definition by default, and only support it where explicitly needed? What's your thoughts about this? |
2b1814e to
4a9932c
Compare
Thanks for the details @hantangwangd |
4a9932c to
dd2a059
Compare
There was a problem hiding this comment.
Left some comments, but overall looks good to me. Thank you for implementing this feature.
It's interesting that multiple different branches can evolve independently within a single Iceberg table transaction - in some scenarios, this could even serve as a kind of constrained, lightweight version of multi-table transactions. After this PR is merged, I will add some test cases to verify this functionality.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
dd2a059 to
1bb83a3
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Just one nit, otherwise looks good to me.
The test failure seems relevant.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
1bb83a3 to
3fb476e
Compare
3fb476e to
4312f36
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @agrawalreetika, lgtm!
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
…stodb#27147) ## Description Add support for mutating an Iceberg branch ## Motivation and Context Resolves prestodb#22030 ## Impact Resolves prestodb#22030 Add support for mutating an Iceberg branch based on the syntax disucssed [here](prestodb#22030 (comment)) ``` INSERT INTO "orders.branch_audit_branch" VALUES (1, 'Product A', 100.00); UPDATE "orders.branch_audit_branch" SET price = 120.00 WHERE id = 1; DELETE FROM "orders.branch_audit_branch" WHERE id = 2; ``` ## Test Plan Added ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Summary by Sourcery Add branch-aware mutation support to the Iceberg connector, enabling INSERT/UPDATE/DELETE operations on specific Iceberg branches via extended table naming. New Features: - Support addressing Iceberg branches in table names using a branch-qualified naming pattern for mutations. - Allow INSERT operations to append data to a specific Iceberg branch instead of the main table. - Allow UPDATE and DELETE operations to modify data on a specific Iceberg branch with appropriate branch routing and validation. Enhancements: - Improve error messaging and validation when committing Iceberg updates by including branch context and verifying that referenced branches exist. Tests: - Add TestIcebergBranchMutations covering inserts, updates, deletes, multi-step mutations, complex predicates, branch isolation, and INSERT ... SELECT into Iceberg branches. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == Iceberg Connector Changes * Add support for mutating an Iceberg branch ```
Description
Add support for mutating an Iceberg branch
Motivation and Context
Resolves #22030
Impact
Resolves #22030
Add support for mutating an Iceberg branch based on the syntax disucssed here
Test Plan
Added
Contributor checklist
Summary by Sourcery
Add branch-aware mutation support to the Iceberg connector, enabling INSERT/UPDATE/DELETE operations on specific Iceberg branches via extended table naming.
New Features:
Enhancements:
Tests:
Release Notes
Please follow release notes guidelines and fill in the release notes below.