Skip to content

fix(plugin-iceberg): Drop data table if the MV fails validation#26994

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
tdcmeehan:fix-orphan
Jan 21, 2026
Merged

fix(plugin-iceberg): Drop data table if the MV fails validation#26994
tdcmeehan merged 1 commit intoprestodb:masterfrom
tdcmeehan:fix-orphan

Conversation

@tdcmeehan
Copy link
Copy Markdown
Contributor

Description

Clean up orphaned storage tables when materialized view creation fails in Iceberg connector.

Motivation and Context

When legacy_materialized_views=true, the security mode is not provided, causing MV creation to fail after the storage table is created. This left orphaned storage tables in Iceberg.

Impact

Bug fix

Test Plan

Included a unit test

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 20, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 20, 2026

Reviewer's Guide

This PR updates Iceberg materialized view creation so that if validation of the view definition fails after the storage table is created, the storage table is dropped to avoid leaving orphaned data tables, and adds a regression test around this behavior under legacy MV settings.

Sequence diagram for Iceberg materialized view creation and cleanup on validation failure

sequenceDiagram
    actor User
    participant Coordinator as CoordinatorSession
    participant IcebergMetadata as IcebergAbstractMetadata
    participant Storage as IcebergStorageTable

    User->>Coordinator: create materialized view
    Coordinator->>IcebergMetadata: createMaterializedView(session, viewName, viewDefinition, materializedViewProperties)
    activate IcebergMetadata

    IcebergMetadata->>Storage: createTable(session, storageTableMetadata, false)
    Storage-->>IcebergMetadata: storageTable created

    rect rgb(255, 240, 240)
        IcebergMetadata->>IcebergMetadata: buildMaterializedViewProperties(viewDefinition, materializedViewProperties)
        note over IcebergMetadata: owner and securityMode must be present
        IcebergMetadata->>IcebergMetadata: createIcebergView(session, viewName, columns, originalSql, properties)
        alt validation or view creation fails
            IcebergMetadata-->>IcebergMetadata: Exception e
            IcebergMetadata->>IcebergMetadata: dropStorageTable(session, storageTableName)
            IcebergMetadata->>Storage: dropTable(session, storageTableHandle)
            Storage-->>IcebergMetadata: storageTable dropped
            IcebergMetadata-->>Coordinator: rethrow e (with suppressed cleanup exception if any)
        else success
            IcebergMetadata-->>Coordinator: return
        end
    end

    deactivate IcebergMetadata
    Coordinator-->>User: success or INVALID_VIEW error
Loading

Class diagram for updated IcebergAbstractMetadata materialized view handling

classDiagram
    class IcebergAbstractMetadata {
        +createMaterializedView(ConnectorSession session, SchemaTableName viewName, ConnectorMaterializedViewDefinition viewDefinition, Map materializedViewProperties) void
        -dropStorageTable(ConnectorSession session, SchemaTableName storageTableName) void
    }

    class ConnectorSession
    class SchemaTableName {
        +getSchemaName() String
        +getTableName() String
    }

    class ConnectorMaterializedViewDefinition {
        +getOriginalSql() String
        +getBaseTables() List
        +getColumnMappings() Map
        +getOwner() Optional
        +getSecurityMode() Optional
    }

    class MaterializedViewRefreshType

    class PrestoException

    IcebergAbstractMetadata --> ConnectorSession
    IcebergAbstractMetadata --> SchemaTableName
    IcebergAbstractMetadata --> ConnectorMaterializedViewDefinition
    IcebergAbstractMetadata --> MaterializedViewRefreshType
    IcebergAbstractMetadata --> PrestoException
Loading

File-Level Changes

Change Details Files
Wrap materialized view property construction and view creation in error handling that cleans up the storage table on failure.
  • Enclose materialized view property population and createIcebergView call in a try/catch that catches any Exception.
  • On failure, attempt to drop the previously created storage table and suppress cleanup exceptions onto the original exception before rethrowing.
  • Replace checkState validations for owner and security mode with PrestoException(INVALID_VIEW, ...) to produce user-facing validation errors, including guidance about legacy_materialized_views for missing security mode.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Add a helper to drop the storage table by SchemaTableName and use it during MV creation cleanup.
  • Introduce dropStorageTable(ConnectorSession, SchemaTableName) that resolves a table handle and calls dropTable if the table exists.
  • Invoke dropStorageTable in the createMaterializedView error handling path when validation or view creation fails after the storage table is created.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Extend Iceberg materialized view tests to cover legacy MV toggling and to assert no orphan storage tables on validation failure.
  • Adjust test QueryRunner configuration to enable the legacy materialized views toggle via experimental.allow-legacy-materialized-views-toggle.
  • Add testNoOrphanStorageTableOnValidationFailure which creates a base table, attempts MV creation with legacy_materialized_views=true expecting a security-mode validation failure, and verifies the MV storage table does not exist via both Presto queries and Iceberg REST catalog APIs.
  • Ensure base table cleanup in a finally block to keep tests isolated.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewMetadata.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@tdcmeehan tdcmeehan marked this pull request as ready for review January 21, 2026 13:44
@tdcmeehan tdcmeehan requested review from a team, ZacBlanco and hantangwangd as code owners January 21, 2026 13:44
@prestodb-ci prestodb-ci requested review from a team, ShahimSharafudeen and infvg and removed request for a team January 21, 2026 13:44
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new try { ... } catch (Exception e) around MV creation is very broad; consider narrowing it (e.g., to PrestoException or runtime exceptions you actually expect) so you don't inadvertently mask unrelated fatal errors while still ensuring cleanup.
  • In dropStorageTable, a concurrent drop could still cause dropTable to throw; it may be safer to catch and ignore a specific 'not found'/TABLE_NOT_FOUND error code so cleanup remains idempotent and doesn't mask the original failure.
  • The INVALID_VIEW message "(set legacy_materialized_views=false)" bakes a specific config name into the error; consider removing or externalizing this hint so the exception message stays accurate if configuration options change.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `try { ... } catch (Exception e)` around MV creation is very broad; consider narrowing it (e.g., to `PrestoException` or runtime exceptions you actually expect) so you don't inadvertently mask unrelated fatal errors while still ensuring cleanup.
- In `dropStorageTable`, a concurrent drop could still cause `dropTable` to throw; it may be safer to catch and ignore a specific 'not found'/`TABLE_NOT_FOUND` error code so cleanup remains idempotent and doesn't mask the original failure.
- The `INVALID_VIEW` message "(set legacy_materialized_views=false)" bakes a specific config name into the error; consider removing or externalizing this hint so the exception message stays accurate if configuration options change.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewMetadata.java:763-772` </location>
<code_context>
+    @Test
</code_context>

<issue_to_address>
**suggestion (testing):** Also assert that the materialized view itself is not registered after the failed creation

Your test already checks that the storage table is removed at both the SQL layer and via the REST catalog. To strengthen coverage from a user perspective, add an assertion that no materialized view entry exists in metadata after the failure (e.g., via `SHOW TABLES`, `SHOW MATERIALIZED VIEWS`, or system metadata tables) so we catch cases where the storage table is dropped but a stale MV record remains.

Suggested implementation:

```java
    @Test
    public void testNoOrphanStorageTableOnValidationFailure()
            throws Exception
    {
        try (RESTCatalog catalog = new RESTCatalog()) {
            assertUpdate("CREATE TABLE test_orphan_base (id BIGINT, value BIGINT)");
            assertUpdate("INSERT INTO test_orphan_base VALUES (1, 100)", 1);

            Session legacySession = Session.builder(getSession())
                    .setSystemProperty("legacy_materialized_views", "true")
                    .build();

            // Ensure no stale materialized view metadata entry is registered after the failed creation.
            // This verifies that both the storage table and the MV record are fully cleaned up.
            assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");

```

This patch assumes:
1. The materialized view created (and expected to fail) in this test uses a name that matches the pattern `test_orphan_%` (e.g., `test_orphan_mv`). If a different name is used, adjust the `LIKE` pattern in `SHOW MATERIALIZED VIEWS` accordingly to specifically target the MV under test (for example: `SHOW MATERIALIZED VIEWS LIKE 'test_orphan_mv'`).
2. The method `assertQueryReturnsEmptyResult(String sql)` is available in the test base class (it is common in Presto tests). If it is not available in this particular test base, replace the assertion with an equivalent that checks for an empty result set, such as:
   - `assertQuery("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'", "SELECT * FROM (VALUES 1) WHERE 1 = 0")`, or
   - another project-specific helper for asserting an empty result.
3. If the failure and cleanup logic occurs later in the method (after the snippet you provided), move the `assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");` line to just after the point where you currently assert that the Iceberg storage table has been removed (both in SQL and via the REST catalog). This ensures the assertion is executed *after* the failed materialized view creation and cleanup, which is the intended behavior for the test.
</issue_to_address>

### Comment 2
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewMetadata.java:775-776` </location>
<code_context>
+                    .setSystemProperty("legacy_materialized_views", "true")
+                    .build();
+
+            String mvName = "test_orphan_mv";
+            String storageTableName = "__mv_storage__" + mvName;
+
+            assertQueryFails(
</code_context>

<issue_to_address>
**nitpick (testing):** Clarify intent of hard-coded storage table naming to avoid future brittleness

This test intentionally depends on the current storage table naming convention (`"__mv_storage__" + mvName`) to verify that the specific storage table is removed. Please add a short comment to make this explicit, so future refactors of the naming logic either update this test or introduce a helper for computing the storage table name instead of silently breaking the assertion.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +763 to +772
@Test
public void testNoOrphanStorageTableOnValidationFailure()
throws Exception
{
try (RESTCatalog catalog = new RESTCatalog()) {
assertUpdate("CREATE TABLE test_orphan_base (id BIGINT, value BIGINT)");
assertUpdate("INSERT INTO test_orphan_base VALUES (1, 100)", 1);

Session legacySession = Session.builder(getSession())
.setSystemProperty("legacy_materialized_views", "true")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Also assert that the materialized view itself is not registered after the failed creation

Your test already checks that the storage table is removed at both the SQL layer and via the REST catalog. To strengthen coverage from a user perspective, add an assertion that no materialized view entry exists in metadata after the failure (e.g., via SHOW TABLES, SHOW MATERIALIZED VIEWS, or system metadata tables) so we catch cases where the storage table is dropped but a stale MV record remains.

Suggested implementation:

    @Test
    public void testNoOrphanStorageTableOnValidationFailure()
            throws Exception
    {
        try (RESTCatalog catalog = new RESTCatalog()) {
            assertUpdate("CREATE TABLE test_orphan_base (id BIGINT, value BIGINT)");
            assertUpdate("INSERT INTO test_orphan_base VALUES (1, 100)", 1);

            Session legacySession = Session.builder(getSession())
                    .setSystemProperty("legacy_materialized_views", "true")
                    .build();

            // Ensure no stale materialized view metadata entry is registered after the failed creation.
            // This verifies that both the storage table and the MV record are fully cleaned up.
            assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'");

This patch assumes:

  1. The materialized view created (and expected to fail) in this test uses a name that matches the pattern test_orphan_% (e.g., test_orphan_mv). If a different name is used, adjust the LIKE pattern in SHOW MATERIALIZED VIEWS accordingly to specifically target the MV under test (for example: SHOW MATERIALIZED VIEWS LIKE 'test_orphan_mv').
  2. The method assertQueryReturnsEmptyResult(String sql) is available in the test base class (it is common in Presto tests). If it is not available in this particular test base, replace the assertion with an equivalent that checks for an empty result set, such as:
    • assertQuery("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'", "SELECT * FROM (VALUES 1) WHERE 1 = 0"), or
    • another project-specific helper for asserting an empty result.
  3. If the failure and cleanup logic occurs later in the method (after the snippet you provided), move the assertQueryReturnsEmptyResult("SHOW MATERIALIZED VIEWS LIKE 'test_orphan_%'"); line to just after the point where you currently assert that the Iceberg storage table has been removed (both in SQL and via the REST catalog). This ensures the assertion is executed after the failed materialized view creation and cleanup, which is the intended behavior for the test.

Comment on lines +775 to +776
String mvName = "test_orphan_mv";
String storageTableName = "__mv_storage__" + mvName;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (testing): Clarify intent of hard-coded storage table naming to avoid future brittleness

This test intentionally depends on the current storage table naming convention ("__mv_storage__" + mvName) to verify that the specific storage table is removed. Please add a short comment to make this explicit, so future refactors of the naming logic either update this test or introduce a helper for computing the storage table name instead of silently breaking the assertion.

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix. Overall looks good to me, just one quick question.

Copy link
Copy Markdown
Contributor

@PingLiuPing PingLiuPing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tdcmeehan, lgtm!

@tdcmeehan tdcmeehan merged commit 3b328e7 into prestodb:master Jan 21, 2026
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants