fix(plugin-iceberg): Drop data table if the MV fails validation#26994
fix(plugin-iceberg): Drop data table if the MV fails validation#26994tdcmeehan merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideThis PR updates Iceberg materialized view creation so that if validation of the view definition fails after the storage table is created, the storage table is dropped to avoid leaving orphaned data tables, and adds a regression test around this behavior under legacy MV settings. Sequence diagram for Iceberg materialized view creation and cleanup on validation failuresequenceDiagram
actor User
participant Coordinator as CoordinatorSession
participant IcebergMetadata as IcebergAbstractMetadata
participant Storage as IcebergStorageTable
User->>Coordinator: create materialized view
Coordinator->>IcebergMetadata: createMaterializedView(session, viewName, viewDefinition, materializedViewProperties)
activate IcebergMetadata
IcebergMetadata->>Storage: createTable(session, storageTableMetadata, false)
Storage-->>IcebergMetadata: storageTable created
rect rgb(255, 240, 240)
IcebergMetadata->>IcebergMetadata: buildMaterializedViewProperties(viewDefinition, materializedViewProperties)
note over IcebergMetadata: owner and securityMode must be present
IcebergMetadata->>IcebergMetadata: createIcebergView(session, viewName, columns, originalSql, properties)
alt validation or view creation fails
IcebergMetadata-->>IcebergMetadata: Exception e
IcebergMetadata->>IcebergMetadata: dropStorageTable(session, storageTableName)
IcebergMetadata->>Storage: dropTable(session, storageTableHandle)
Storage-->>IcebergMetadata: storageTable dropped
IcebergMetadata-->>Coordinator: rethrow e (with suppressed cleanup exception if any)
else success
IcebergMetadata-->>Coordinator: return
end
end
deactivate IcebergMetadata
Coordinator-->>User: success or INVALID_VIEW error
Class diagram for updated IcebergAbstractMetadata materialized view handlingclassDiagram
class IcebergAbstractMetadata {
+createMaterializedView(ConnectorSession session, SchemaTableName viewName, ConnectorMaterializedViewDefinition viewDefinition, Map materializedViewProperties) void
-dropStorageTable(ConnectorSession session, SchemaTableName storageTableName) void
}
class ConnectorSession
class SchemaTableName {
+getSchemaName() String
+getTableName() String
}
class ConnectorMaterializedViewDefinition {
+getOriginalSql() String
+getBaseTables() List
+getColumnMappings() Map
+getOwner() Optional
+getSecurityMode() Optional
}
class MaterializedViewRefreshType
class PrestoException
IcebergAbstractMetadata --> ConnectorSession
IcebergAbstractMetadata --> SchemaTableName
IcebergAbstractMetadata --> ConnectorMaterializedViewDefinition
IcebergAbstractMetadata --> MaterializedViewRefreshType
IcebergAbstractMetadata --> PrestoException
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The new
try { ... } catch (Exception e)around MV creation is very broad; consider narrowing it (e.g., toPrestoExceptionor runtime exceptions you actually expect) so you don't inadvertently mask unrelated fatal errors while still ensuring cleanup. - In
dropStorageTable, a concurrent drop could still causedropTableto throw; it may be safer to catch and ignore a specific 'not found'/TABLE_NOT_FOUNDerror code so cleanup remains idempotent and doesn't mask the original failure. - The
INVALID_VIEWmessage "(set legacy_materialized_views=false)" bakes a specific config name into the error; consider removing or externalizing this hint so the exception message stays accurate if configuration options change.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `try { ... } catch (Exception e)` around MV creation is very broad; consider narrowing it (e.g., to `PrestoException` or runtime exceptions you actually expect) so you don't inadvertently mask unrelated fatal errors while still ensuring cleanup.
- In `dropStorageTable`, a concurrent drop could still cause `dropTable` to throw; it may be safer to catch and ignore a specific 'not found'/`TABLE_NOT_FOUND` error code so cleanup remains idempotent and doesn't mask the original failure.
- The `INVALID_VIEW` message "(set legacy_materialized_views=false)" bakes a specific config name into the error; consider removing or externalizing this hint so the exception message stays accurate if configuration options change.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewMetadata.java:763-772` </location>
<code_context>
+ @Test
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert that the materialized view itself is not registered after the failed creation
Your test already checks that the storage table is removed at both the SQL layer and via the REST catalog. To strengthen coverage from a user perspective, add an assertion that no materialized view entry exists in metadata after the failure (e.g., via `SHOW TABLES`, `SHOW MATERIALIZED VIEWS`, or system metadata tables) so we catch cases where the storage table is dropped but a stale MV record remains.
Suggested implementation:
```java
@Test
public void testNoOrphanStorageTableOnValidationFailure()
throws Exception
{
try (RESTCatalog catalog = new RESTCatalog()) {
assertUpdate("CREATE TABLE test_orphan_base (id BIGINT, value BIGINT)");
assertUpdate("INSERT INTO test_orphan_base VALUES (1, 100)", 1);
Session legacySession = Session.builder(getSession())
.setSystemProperty("legacy_materialized_views", "true")
.build();
// Ensure no stale materialized view metadata entry is registered after the failed creation.
// This verifies that both the storage table and the MV record are fully cleaned up.
assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");
```
This patch assumes:
1. The materialized view created (and expected to fail) in this test uses a name that matches the pattern `test_orphan_%` (e.g., `test_orphan_mv`). If a different name is used, adjust the `LIKE` pattern in `SHOW MATERIALIZED VIEWS` accordingly to specifically target the MV under test (for example: `SHOW MATERIALIZED VIEWS LIKE 'test_orphan_mv'`).
2. The method `assertQueryReturnsEmptyResult(String sql)` is available in the test base class (it is common in Presto tests). If it is not available in this particular test base, replace the assertion with an equivalent that checks for an empty result set, such as:
- `assertQuery("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'", "SELECT * FROM (VALUES 1) WHERE 1 = 0")`, or
- another project-specific helper for asserting an empty result.
3. If the failure and cleanup logic occurs later in the method (after the snippet you provided), move the `assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");` line to just after the point where you currently assert that the Iceberg storage table has been removed (both in SQL and via the REST catalog). This ensures the assertion is executed *after* the failed materialized view creation and cleanup, which is the intended behavior for the test.
</issue_to_address>
### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewMetadata.java:775-776` </location>
<code_context>
+ .setSystemProperty("legacy_materialized_views", "true")
+ .build();
+
+ String mvName = "test_orphan_mv";
+ String storageTableName = "__mv_storage__" + mvName;
+
+ assertQueryFails(
</code_context>
<issue_to_address>
**nitpick (testing):** Clarify intent of hard-coded storage table naming to avoid future brittleness
This test intentionally depends on the current storage table naming convention (`"__mv_storage__" + mvName`) to verify that the specific storage table is removed. Please add a short comment to make this explicit, so future refactors of the naming logic either update this test or introduce a helper for computing the storage table name instead of silently breaking the assertion.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testNoOrphanStorageTableOnValidationFailure() | ||
| throws Exception | ||
| { | ||
| try (RESTCatalog catalog = new RESTCatalog()) { | ||
| assertUpdate("CREATE TABLE test_orphan_base (id BIGINT, value BIGINT)"); | ||
| assertUpdate("INSERT INTO test_orphan_base VALUES (1, 100)", 1); | ||
|
|
||
| Session legacySession = Session.builder(getSession()) | ||
| .setSystemProperty("legacy_materialized_views", "true") |
There was a problem hiding this comment.
suggestion (testing): Also assert that the materialized view itself is not registered after the failed creation
Your test already checks that the storage table is removed at both the SQL layer and via the REST catalog. To strengthen coverage from a user perspective, add an assertion that no materialized view entry exists in metadata after the failure (e.g., via SHOW TABLES, SHOW MATERIALIZED VIEWS, or system metadata tables) so we catch cases where the storage table is dropped but a stale MV record remains.
Suggested implementation:
@Test
public void testNoOrphanStorageTableOnValidationFailure()
throws Exception
{
try (RESTCatalog catalog = new RESTCatalog()) {
assertUpdate("CREATE TABLE test_orphan_base (id BIGINT, value BIGINT)");
assertUpdate("INSERT INTO test_orphan_base VALUES (1, 100)", 1);
Session legacySession = Session.builder(getSession())
.setSystemProperty("legacy_materialized_views", "true")
.build();
// Ensure no stale materialized view metadata entry is registered after the failed creation.
// This verifies that both the storage table and the MV record are fully cleaned up.
assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");This patch assumes:
- The materialized view created (and expected to fail) in this test uses a name that matches the pattern
test_orphan_%(e.g.,test_orphan_mv). If a different name is used, adjust theLIKEpattern inSHOW MATERIALIZED VIEWSaccordingly to specifically target the MV under test (for example:SHOW MATERIALIZED VIEWS LIKE 'test_orphan_mv'). - The method
assertQueryReturnsEmptyResult(String sql)is available in the test base class (it is common in Presto tests). If it is not available in this particular test base, replace the assertion with an equivalent that checks for an empty result set, such as:assertQuery("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'", "SELECT * FROM (VALUES 1) WHERE 1 = 0"), or- another project-specific helper for asserting an empty result.
- If the failure and cleanup logic occurs later in the method (after the snippet you provided), move the
assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");line to just after the point where you currently assert that the Iceberg storage table has been removed (both in SQL and via the REST catalog). This ensures the assertion is executed after the failed materialized view creation and cleanup, which is the intended behavior for the test.
| String mvName = "test_orphan_mv"; | ||
| String storageTableName = "__mv_storage__" + mvName; |
There was a problem hiding this comment.
nitpick (testing): Clarify intent of hard-coded storage table naming to avoid future brittleness
This test intentionally depends on the current storage table naming convention ("__mv_storage__" + mvName) to verify that the specific storage table is removed. Please add a short comment to make this explicit, so future refactors of the naming logic either update this test or introduce a helper for computing the storage table name instead of silently breaking the assertion.
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for this fix. Overall looks good to me, just one quick question.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Show resolved
Hide resolved
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @tdcmeehan, lgtm!
Description
Clean up orphaned storage tables when materialized view creation fails in Iceberg connector.
Motivation and Context
When legacy_materialized_views=true, the security mode is not provided, causing MV creation to fail after the storage table is created. This left orphaned storage tables in Iceberg.
Impact
Bug fix
Test Plan
Included a unit test
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: