Skip to content

feat(analyzer): Allow CTAS and INSERT from materialized views#27227

Merged
ceekay47 merged 1 commit intoprestodb:masterfrom
ceekay47:export-D94597203
Mar 7, 2026
Merged

feat(analyzer): Allow CTAS and INSERT from materialized views#27227
ceekay47 merged 1 commit intoprestodb:masterfrom
ceekay47:export-D94597203

Conversation

@ceekay47
Copy link
Copy Markdown
Contributor

@ceekay47 ceekay47 commented Feb 27, 2026

Summary:
Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked
when selecting from a materialized view. This was overly restrictive:

  • CTAS is always safe because the target table is new and cannot be a base
    table of the materialized view (no circular dependency possible).
  • INSERT is safe as long as the target table is not one of the MV's base
    tables.

This change removes the blanket restriction and replaces it with a targeted
circular dependency check: only INSERT into a base table of the materialized
view is blocked, with a clear error message explaining why.

Differential Revision: D94597203

Summary by Sourcery

Allow CTAS and most INSERT ... SELECT operations from materialized views while preventing circular dependencies into their base tables.

Bug Fixes:

  • Block INSERT ... SELECT from a materialized view only when the target table is one of the materialized view's base tables, avoiding circular dependency issues.

Enhancements:

  • Relax previous blanket restriction to permit CTAS and INSERT ... SELECT from materialized views when the target is not a base table, improving flexibility of materialized view usage.

Tests:

  • Extend Hive materialized view planner tests to cover successful CTAS and INSERT from materialized views, refreshed materialized views, and failure when inserting into a base table.
== RELEASE NOTES ==

General Changes
* Add support for CTAS and INSERT from materialized views.

@ceekay47 ceekay47 requested review from a team, feilong-liu and jaystarshot as code owners February 27, 2026 02:54
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 27, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 27, 2026

Reviewer's Guide

Refines semantic checks for statements reading from materialized views so that CREATE TABLE AS SELECT is allowed and INSERT is allowed except when targeting one of the materialized view’s base tables, and updates Hive materialized view planner tests accordingly.

Sequence diagram for INSERT circular dependency guard on materialized views

sequenceDiagram
    actor Client
    participant StatementAnalyzer
    participant Analysis
    participant Session
    participant MetadataResolver
    participant MaterializedViewDefinition

    Client->>StatementAnalyzer: analyze Insert statement
    StatementAnalyzer->>Analysis: getStatement
    Analysis-->>StatementAnalyzer: Insert

    loop For each referenced Table
        StatementAnalyzer->>MetadataResolver: getMaterializedViewDefinition session metadataHandle name
        MetadataResolver-->>StatementAnalyzer: optionalMaterializedView

        alt Materialized view present and statement is Insert
            StatementAnalyzer->>Analysis: getStatement
            Analysis-->>StatementAnalyzer: Insert
            StatementAnalyzer->>StatementAnalyzer: createQualifiedObjectName session insert insertTarget metadata
            StatementAnalyzer->>StatementAnalyzer: new SchemaTableName schemaName tableName
            StatementAnalyzer->>MaterializedViewDefinition: getBaseTables
            MaterializedViewDefinition-->>StatementAnalyzer: baseTables

            alt target table is in baseTables
                StatementAnalyzer->>StatementAnalyzer: throw SemanticException NOT_SUPPORTED
                StatementAnalyzer-->>Client: semantic error response
            else target table not in baseTables
                StatementAnalyzer->>StatementAnalyzer: continue analysis of query
            end
        else Not an Insert or not a materialized view
            StatementAnalyzer->>StatementAnalyzer: continue analysis of query
        end
    end
Loading

Class diagram for StatementAnalyzer materialized view circular dependency check

classDiagram
    class StatementAnalyzer {
        - Analysis analysis
        - MetadataResolver metadataResolver
        - Session session
        - Metadata metadata
        + visitTable Table table Optional_scope scope Scope
    }

    class Analysis {
        + getStatement() Statement
        + getMetadataHandle() MetadataHandle
    }

    class Statement {
    }

    class Insert {
        + getTarget() Table
    }

    class CreateTableAsSelect {
    }

    class Table {
        + getName() QualifiedName
    }

    class MaterializedViewDefinition {
        + getTable() SchemaTableName
        + getBaseTables() List_SchemaTableName
    }

    class QualifiedObjectName {
        + getSchemaName() String
        + getObjectName() String
    }

    class SchemaTableName {
        + SchemaTableName(String schemaName, String tableName)
        + getSchemaName() String
        + getTableName() String
    }

    class MetadataResolver {
        + getMaterializedViewDefinition(Session session, MetadataResolver metadataResolver, MetadataHandle metadataHandle, QualifiedName name) Optional_MaterializedViewDefinition
    }

    class Session {
    }

    class Metadata {
    }

    class SemanticException {
        + SemanticException(SemanticErrorCode code, Node node, String message, Object arg1, Object arg2, Object arg3)
    }

    class SemanticErrorCode {
        <<enum>>
        NOT_SUPPORTED
    }

    class Node {
    }

    class QualifiedName {
    }

    class Scope {
    }

    class Optional_scope {
    }

    class Optional_MaterializedViewDefinition {
        + isPresent() boolean
        + get() MaterializedViewDefinition
    }

    class List_SchemaTableName {
        + contains(SchemaTableName table) boolean
    }

    StatementAnalyzer --> Analysis
    StatementAnalyzer --> MetadataResolver
    StatementAnalyzer --> Session
    StatementAnalyzer --> Metadata
    Analysis --> Statement
    Statement <|-- Insert
    Statement <|-- CreateTableAsSelect
    Insert --> Table
    Table --> QualifiedName
    MetadataResolver --> Optional_MaterializedViewDefinition
    Optional_MaterializedViewDefinition --> MaterializedViewDefinition
    MaterializedViewDefinition --> SchemaTableName
    MaterializedViewDefinition --> List_SchemaTableName
    QualifiedObjectName --> SchemaTableName
    SemanticException --> SemanticErrorCode
    SemanticException --> Node
    List_SchemaTableName --> SchemaTableName
Loading

File-Level Changes

Change Details Files
Replace blanket prohibition of INSERT/CTAS from materialized views with a targeted circular-dependency check only for INSERT into MV base tables.
  • Remove previous semantic check that rejected both INSERT and CREATE TABLE AS SELECT when the source table is a materialized view.
  • Introduce logic to resolve the INSERT target table name and build a SchemaTableName for comparison.
  • Block INSERT when the target table is found in the materialized view’s baseTables collection, throwing a NOT_SUPPORTED SemanticException with an explanatory error message.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Extend Hive materialized view logical planner test to cover the new allowed and forbidden behaviors for CTAS and INSERT from materialized views.
  • Add an additional target table to validate CTAS from a refreshed materialized view.
  • Change expectations so CTAS from a materialized view succeeds and verifies table existence.
  • Add positive test for INSERT from a materialized view into a non-base table and negative test for INSERT into a base table with the new error message pattern.
  • Ensure cleanup of all newly created tables in the test finally block.
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In the new circular-dependency check, consider binding optionalMaterializedView.get() to a local MaterializedViewDefinition variable to avoid repeated get() calls and make the block easier to read and reason about.
  • The circular dependency detection uses SchemaTableName (schema + table) only; if materialized views and base tables can span multiple catalogs, consider including catalog in the comparison to avoid false positives/negatives across catalogs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the new circular-dependency check, consider binding `optionalMaterializedView.get()` to a local `MaterializedViewDefinition` variable to avoid repeated `get()` calls and make the block easier to read and reason about.
- The circular dependency detection uses `SchemaTableName` (schema + table) only; if materialized views and base tables can span multiple catalogs, consider including catalog in the comparison to avoid false positives/negatives across catalogs.

## Individual Comments

### Comment 1
<location path="presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java" line_range="2939-2946" />
<code_context>

-            assertQueryFails(format("CREATE TABLE %s AS SELECT * FROM %s", table3, view),
-                    ".*CreateTableAsSelect by selecting from a materialized view \\w+ is not supported.*");
+            // CTAS from a materialized view should succeed
+            assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
+            assertTrue(getQueryRunner().tableExists(getSession(), table3));
+
+            // Refresh the MV so it has data, then CTAS should read from the refreshed MV
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen CTAS-from-MV tests by asserting on the actual row contents or counts, not just table existence

The current checks only confirm that `table3`/`table4` exist and that `assertUpdate` reports `255` rows. To validate the behavior change, please also assert on the produced data. For instance:
- For `table3`, compare `SELECT * FROM table3` with the expected subset of the MV/base table (e.g., `SELECT * FROM view WHERE ds = '2020-01-01'`).
- For `table4`, after refresh, verify that CTAS reads from the refreshed MV by comparing its contents against the MV/base-table query rather than relying only on row count.
This will better prove CTAS is using the correct MV contents, including after refresh.

```suggestion
            // CTAS from a materialized view should succeed
            assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
            assertTrue(getQueryRunner().tableExists(getSession(), table3));
            // Verify CTAS-from-MV produces the expected subset of data
            assertQuery(
                    format("SELECT * FROM %s", table3),
                    format("SELECT * FROM %s WHERE ds = '2020-01-01'", view));

            // Refresh the MV so it has data, then CTAS should read from the refreshed MV
            assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255);
            assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255);
            assertTrue(getQueryRunner().tableExists(getSession(), table4));
            // Verify CTAS reads from the refreshed MV contents
            assertQuery(
                    format("SELECT * FROM %s", table4),
                    format("SELECT * FROM %s WHERE ds = '2020-01-01'", view));
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +2939 to +2946
// CTAS from a materialized view should succeed
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table3));

// Refresh the MV so it has data, then CTAS should read from the refreshed MV
assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255);
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table4));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Strengthen CTAS-from-MV tests by asserting on the actual row contents or counts, not just table existence

The current checks only confirm that table3/table4 exist and that assertUpdate reports 255 rows. To validate the behavior change, please also assert on the produced data. For instance:

  • For table3, compare SELECT * FROM table3 with the expected subset of the MV/base table (e.g., SELECT * FROM view WHERE ds = '2020-01-01').
  • For table4, after refresh, verify that CTAS reads from the refreshed MV by comparing its contents against the MV/base-table query rather than relying only on row count.
    This will better prove CTAS is using the correct MV contents, including after refresh.
Suggested change
// CTAS from a materialized view should succeed
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table3));
// Refresh the MV so it has data, then CTAS should read from the refreshed MV
assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255);
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table4));
// CTAS from a materialized view should succeed
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table3));
// Verify CTAS-from-MV produces the expected subset of data
assertQuery(
format("SELECT * FROM %s", table3),
format("SELECT * FROM %s WHERE ds = '2020-01-01'", view));
// Refresh the MV so it has data, then CTAS should read from the refreshed MV
assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255);
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table4));
// Verify CTAS reads from the refreshed MV contents
assertQuery(
format("SELECT * FROM %s", table4),
format("SELECT * FROM %s WHERE ds = '2020-01-01'", view));

@ceekay47 ceekay47 changed the title [presto] Allow CTAS and INSERT from materialized views with circular dependency guard [presto] Allow CTAS and INSERT from materialized views Feb 27, 2026
@ceekay47 ceekay47 changed the title [presto] Allow CTAS and INSERT from materialized views feat(analyzer): Allow CTAS and INSERT from materialized views Feb 27, 2026
ceekay47 added a commit to ceekay47/presto that referenced this pull request Feb 27, 2026
Summary:

Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked
when selecting from a materialized view. This was overly restrictive:

- CTAS is always safe because the target table is new and cannot be a base
  table of the materialized view (no circular dependency possible).
- INSERT is safe as long as the target table is not one of the MV's base
  tables.

This change removes the blanket restriction and replaces it with a targeted
circular dependency check: only INSERT into a base table of the materialized
view is blocked, with a clear error message explaining why.

Differential Revision: D94597203
Comment on lines +2206 to +2219
// Prevent INSERT when selecting from a materialized view into one of its base tables (circular dependency).
if (optionalMaterializedView.isPresent() && analysis.getStatement() instanceof Insert) {
Insert insert = (Insert) analysis.getStatement();
QualifiedObjectName targetTable = createQualifiedObjectName(session, insert, insert.getTarget(), metadata);
SchemaTableName targetSchemaTable = new SchemaTableName(targetTable.getSchemaName(), targetTable.getObjectName());
if (optionalMaterializedView.get().getBaseTables().contains(targetSchemaTable)) {
throw new SemanticException(
NOT_SUPPORTED,
table,
"INSERT into table %s by selecting from materialized view %s is not supported because %s is a base table of the materialized view",
targetTable,
optionalMaterializedView.get().getTable(),
targetTable);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ceekay47 thank you for this change. The overall approach makes sense. One thing to note: currently in Iceberg connector, materialized views can be created on top of views, and views themselves can be built on other materialized views or views. Therefore, when identifying base tables that cannot be inserted into, this transitive relationship should be taken into account.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd Thanks for the review! That's an interesting point. I believe the base tables are resolved transitively during materialized view creation () and stored in MaterializedViewDefinition. So I think this condition should handle the transitive relationship for Iceberg as well?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ceekay47 thanks for the explanation. Yes, you are correct that the base tables of the MV have already been properly resolved transitively. Would you mind adding a test case in TestIcebergMaterializedViewBase to show this scenario?

@ceekay47 ceekay47 requested a review from ZacBlanco as a code owner March 6, 2026 07:31
ceekay47 added a commit to ceekay47/presto that referenced this pull request Mar 6, 2026
Summary:

Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked
when selecting from a materialized view. This was overly restrictive:

- CTAS is always safe because the target table is new and cannot be a base
  table of the materialized view (no circular dependency possible).
- INSERT is safe as long as the target table is not one of the MV's base
  tables.

This change removes the blanket restriction and replaces it with a targeted
circular dependency check: only INSERT into a base table of the materialized
view is blocked, with a clear error message explaining why.

Differential Revision: D94597203
Summary:

Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked
when selecting from a materialized view. This was overly restrictive:

- CTAS is always safe because the target table is new and cannot be a base
  table of the materialized view (no circular dependency possible).
- INSERT is safe as long as the target table is not one of the MV's base
  tables.

This change removes the blanket restriction and replaces it with a targeted
circular dependency check: only INSERT into a base table of the materialized
view is blocked, with a clear error message explaining why.

Differential Revision: D94597203
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ceekay47 thanks for adding the tests, lgtm!

@ceekay47 ceekay47 merged commit 9ecc7ac into prestodb:master Mar 7, 2026
81 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants