fix(plugin-iceberg): Disable metadata deletion on varbinary columns#27050
Merged
hantangwangd merged 1 commit intoprestodb:masterfrom Jan 30, 2026
Merged
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideDisables Iceberg column constraint enforcement for VARBINARY partition columns to avoid incorrect partition metadata, and strengthens the varbinary-partitioned table test to cover different insert orders and updated partition expectations. Class diagram for IcebergPlanOptimizer varbinary constraint guard and varbinary partition testsclassDiagram
class IcebergPlanOptimizer {
+static boolean canEnforceColumnConstraintInSpecs(IcebergColumnHandle columnHandle, Set~Integer~ partitionSpecIds, IcebergTable table, Domain domain, ConnectorSession session)
}
class IcebergColumnHandle {
+Type getType()
}
class IcebergTable {
+Map~Integer, IcebergPartitionSpec~ specs()
}
class IcebergPartitionSpec {
+int specId()
}
class Domain
class ConnectorSession
class Type
IcebergPlanOptimizer --> IcebergColumnHandle : uses
IcebergPlanOptimizer --> IcebergTable : uses
IcebergPlanOptimizer --> Domain : uses
IcebergPlanOptimizer --> ConnectorSession : uses
IcebergColumnHandle --> Type : uses
note for IcebergPlanOptimizer "New behavior: canEnforceColumnConstraintInSpecs returns false when columnHandle.getType() is VARBINARY before checking specs()"
class IcebergDistributedTestBase {
+void testPartitionedByVarbinaryType(String insertOrder)
+Object[][] dataProviderForPartitionedByVarbinaryType()
}
IcebergDistributedTestBase ..> IcebergTable : creates and verifies
IcebergDistributedTestBase ..> IcebergPlanOptimizer : indirectly exercises through query planning
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
canEnforceColumnConstraintInSpecs, consider using a type helper (e.g.,isVarbinaryType(columnHandle.getType())) or comparing against a more general binary category instead of a directcolumnHandle.getType() == VARBINARYcheck, to make the guard resilient to future changes in type instances/mappings. - It would be helpful to add a brief code comment near the VARBINARY guard in
IcebergPlanOptimizerreferencing the upstream Iceberg issue and the behavior being worked around, so future changes know when and why this special case can be removed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `canEnforceColumnConstraintInSpecs`, consider using a type helper (e.g., `isVarbinaryType(columnHandle.getType())`) or comparing against a more general binary category instead of a direct `columnHandle.getType() == VARBINARY` check, to make the guard resilient to future changes in type instances/mappings.
- It would be helpful to add a brief code comment near the VARBINARY guard in `IcebergPlanOptimizer` referencing the upstream Iceberg issue and the behavior being worked around, so future changes know when and why this special case can be removed.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java:872-873` </location>
<code_context>
assertEquals(getQueryRunner().execute("select b FROM test_partition_columns_varbinary where b = X'e3bcd1'").getOnlyValue(),
new byte[] {(byte) 0xe3, (byte) 0xbc, (byte) 0xd1});
- assertEquals(getQueryRunner().execute("select count(*) from \"test_partition_columns_varbinary$partitions\"").getOnlyValue(), 1L);
+ assertEquals(getQueryRunner().execute("select count(*) from \"test_partition_columns_varbinary$partitions\"").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("select row_count from \"test_partition_columns_varbinary$partitions\" where b = X'e3bcd1'").getOnlyValue(), 1L);
assertQuerySucceeds("drop table test_partition_columns_varbinary");
</code_context>
<issue_to_address>
**suggestion (testing):** Assert partition metadata more fully (both partitions and their row counts)
The test now only validates `row_count` for the `X'e3bcd1'` partition. To better verify partition metadata, also assert that the `X'bcd1'` partition exists and that both partitions have the expected `row_count` (e.g., 1 row each). For example, query all partitions and assert the `{b -> row_count}` mapping or that `sum(row_count)` equals the table’s total row count.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java:840-849` </location>
<code_context>
+ {"(2, X'e3bcd1'), (1, X'bcd1')"}};
+ }
+
+ @Test(dataProvider = "insertValues")
+ public void testPartitionedByVarbinaryType(String insertValues)
{
</code_context>
<issue_to_address>
**issue (testing):** Add a test that exercises deletes/metadata deletion with varbinary partitions
This test currently only covers inserts/selects, but the bug and fix are about incorrect metadata and partition bounds when deleting from VARBINARY-partitioned tables. Please add or extend a test to:
- Execute a DELETE (e.g., `DELETE FROM test_partition_columns_varbinary WHERE b = X'...'`) that would previously hit the bad constraint enforcement.
- Optionally query `$partitions` afterward to confirm partition metadata and row counts match the table contents.
That will directly exercise the regression scenario for deletes on VARBINARY partitions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
tdcmeehan
reviewed
Jan 29, 2026
Contributor
tdcmeehan
left a comment
There was a problem hiding this comment.
Looks good, just one comment.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/optimizer/IcebergPlanOptimizer.java
Show resolved
Hide resolved
Due to Iceberg issue apache/iceberg#15128, using a binary type as a partition column may cause incorrect calculation of partition bounds in the generated manifest files when deleting data files. This can lead to incorrect results in subsequent queries. Therefore, we temporarily disables metadata deletion and filter thoroughly pushdown for varbinary columns. This restrict can be lifted once the Iceberg issue is resolved.
66858d9 to
6096a5d
Compare
tdcmeehan
approved these changes
Jan 30, 2026
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Due to Iceberg issue apache/iceberg#15128, using a binary type as a partition column may cause incorrect calculation of partition bounds in the generated manifest files when deleting data files. This can lead to incorrect results in subsequent queries.
Therefore, we temporarily disables metadata deletion and filter thoroughly pushdown for varbinary columns. This restrict can be lifted once the Iceberg issue is resolved.
Motivation and Context
Fix the bug when use varbinary columns as partition columns in Iceberg
Impact
This change is not visible to users.
Test Plan
IcebergDistributedTestBase.testPartitionedByVarbinaryTypethrough@DataProvider, which would explicitly fail without this fix.Contributor checklist
Release Notes
Summary by Sourcery
Guard Iceberg plan optimization from enforcing metadata constraints on VARBINARY-partitioned columns and strengthen test coverage for varbinary partitioning behavior.
Bug Fixes:
Tests: