feat(analyzer): Allow CTAS and INSERT from materialized views#27227
feat(analyzer): Allow CTAS and INSERT from materialized views#27227ceekay47 merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideRefines semantic checks for statements reading from materialized views so that CREATE TABLE AS SELECT is allowed and INSERT is allowed except when targeting one of the materialized view’s base tables, and updates Hive materialized view planner tests accordingly. Sequence diagram for INSERT circular dependency guard on materialized viewssequenceDiagram
actor Client
participant StatementAnalyzer
participant Analysis
participant Session
participant MetadataResolver
participant MaterializedViewDefinition
Client->>StatementAnalyzer: analyze Insert statement
StatementAnalyzer->>Analysis: getStatement
Analysis-->>StatementAnalyzer: Insert
loop For each referenced Table
StatementAnalyzer->>MetadataResolver: getMaterializedViewDefinition session metadataHandle name
MetadataResolver-->>StatementAnalyzer: optionalMaterializedView
alt Materialized view present and statement is Insert
StatementAnalyzer->>Analysis: getStatement
Analysis-->>StatementAnalyzer: Insert
StatementAnalyzer->>StatementAnalyzer: createQualifiedObjectName session insert insertTarget metadata
StatementAnalyzer->>StatementAnalyzer: new SchemaTableName schemaName tableName
StatementAnalyzer->>MaterializedViewDefinition: getBaseTables
MaterializedViewDefinition-->>StatementAnalyzer: baseTables
alt target table is in baseTables
StatementAnalyzer->>StatementAnalyzer: throw SemanticException NOT_SUPPORTED
StatementAnalyzer-->>Client: semantic error response
else target table not in baseTables
StatementAnalyzer->>StatementAnalyzer: continue analysis of query
end
else Not an Insert or not a materialized view
StatementAnalyzer->>StatementAnalyzer: continue analysis of query
end
end
Class diagram for StatementAnalyzer materialized view circular dependency checkclassDiagram
class StatementAnalyzer {
- Analysis analysis
- MetadataResolver metadataResolver
- Session session
- Metadata metadata
+ visitTable Table table Optional_scope scope Scope
}
class Analysis {
+ getStatement() Statement
+ getMetadataHandle() MetadataHandle
}
class Statement {
}
class Insert {
+ getTarget() Table
}
class CreateTableAsSelect {
}
class Table {
+ getName() QualifiedName
}
class MaterializedViewDefinition {
+ getTable() SchemaTableName
+ getBaseTables() List_SchemaTableName
}
class QualifiedObjectName {
+ getSchemaName() String
+ getObjectName() String
}
class SchemaTableName {
+ SchemaTableName(String schemaName, String tableName)
+ getSchemaName() String
+ getTableName() String
}
class MetadataResolver {
+ getMaterializedViewDefinition(Session session, MetadataResolver metadataResolver, MetadataHandle metadataHandle, QualifiedName name) Optional_MaterializedViewDefinition
}
class Session {
}
class Metadata {
}
class SemanticException {
+ SemanticException(SemanticErrorCode code, Node node, String message, Object arg1, Object arg2, Object arg3)
}
class SemanticErrorCode {
<<enum>>
NOT_SUPPORTED
}
class Node {
}
class QualifiedName {
}
class Scope {
}
class Optional_scope {
}
class Optional_MaterializedViewDefinition {
+ isPresent() boolean
+ get() MaterializedViewDefinition
}
class List_SchemaTableName {
+ contains(SchemaTableName table) boolean
}
StatementAnalyzer --> Analysis
StatementAnalyzer --> MetadataResolver
StatementAnalyzer --> Session
StatementAnalyzer --> Metadata
Analysis --> Statement
Statement <|-- Insert
Statement <|-- CreateTableAsSelect
Insert --> Table
Table --> QualifiedName
MetadataResolver --> Optional_MaterializedViewDefinition
Optional_MaterializedViewDefinition --> MaterializedViewDefinition
MaterializedViewDefinition --> SchemaTableName
MaterializedViewDefinition --> List_SchemaTableName
QualifiedObjectName --> SchemaTableName
SemanticException --> SemanticErrorCode
SemanticException --> Node
List_SchemaTableName --> SchemaTableName
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In the new circular-dependency check, consider binding
optionalMaterializedView.get()to a localMaterializedViewDefinitionvariable to avoid repeatedget()calls and make the block easier to read and reason about. - The circular dependency detection uses
SchemaTableName(schema + table) only; if materialized views and base tables can span multiple catalogs, consider including catalog in the comparison to avoid false positives/negatives across catalogs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the new circular-dependency check, consider binding `optionalMaterializedView.get()` to a local `MaterializedViewDefinition` variable to avoid repeated `get()` calls and make the block easier to read and reason about.
- The circular dependency detection uses `SchemaTableName` (schema + table) only; if materialized views and base tables can span multiple catalogs, consider including catalog in the comparison to avoid false positives/negatives across catalogs.
## Individual Comments
### Comment 1
<location path="presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java" line_range="2939-2946" />
<code_context>
- assertQueryFails(format("CREATE TABLE %s AS SELECT * FROM %s", table3, view),
- ".*CreateTableAsSelect by selecting from a materialized view \\w+ is not supported.*");
+ // CTAS from a materialized view should succeed
+ assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
+ assertTrue(getQueryRunner().tableExists(getSession(), table3));
+
+ // Refresh the MV so it has data, then CTAS should read from the refreshed MV
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen CTAS-from-MV tests by asserting on the actual row contents or counts, not just table existence
The current checks only confirm that `table3`/`table4` exist and that `assertUpdate` reports `255` rows. To validate the behavior change, please also assert on the produced data. For instance:
- For `table3`, compare `SELECT * FROM table3` with the expected subset of the MV/base table (e.g., `SELECT * FROM view WHERE ds = '2020-01-01'`).
- For `table4`, after refresh, verify that CTAS reads from the refreshed MV by comparing its contents against the MV/base-table query rather than relying only on row count.
This will better prove CTAS is using the correct MV contents, including after refresh.
```suggestion
// CTAS from a materialized view should succeed
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table3));
// Verify CTAS-from-MV produces the expected subset of data
assertQuery(
format("SELECT * FROM %s", table3),
format("SELECT * FROM %s WHERE ds = '2020-01-01'", view));
// Refresh the MV so it has data, then CTAS should read from the refreshed MV
assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255);
assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255);
assertTrue(getQueryRunner().tableExists(getSession(), table4));
// Verify CTAS reads from the refreshed MV contents
assertQuery(
format("SELECT * FROM %s", table4),
format("SELECT * FROM %s WHERE ds = '2020-01-01'", view));
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // CTAS from a materialized view should succeed | ||
| assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255); | ||
| assertTrue(getQueryRunner().tableExists(getSession(), table3)); | ||
|
|
||
| // Refresh the MV so it has data, then CTAS should read from the refreshed MV | ||
| assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255); | ||
| assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255); | ||
| assertTrue(getQueryRunner().tableExists(getSession(), table4)); |
There was a problem hiding this comment.
suggestion (testing): Strengthen CTAS-from-MV tests by asserting on the actual row contents or counts, not just table existence
The current checks only confirm that table3/table4 exist and that assertUpdate reports 255 rows. To validate the behavior change, please also assert on the produced data. For instance:
- For
table3, compareSELECT * FROM table3with the expected subset of the MV/base table (e.g.,SELECT * FROM view WHERE ds = '2020-01-01'). - For
table4, after refresh, verify that CTAS reads from the refreshed MV by comparing its contents against the MV/base-table query rather than relying only on row count.
This will better prove CTAS is using the correct MV contents, including after refresh.
| // CTAS from a materialized view should succeed | |
| assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255); | |
| assertTrue(getQueryRunner().tableExists(getSession(), table3)); | |
| // Refresh the MV so it has data, then CTAS should read from the refreshed MV | |
| assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255); | |
| assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255); | |
| assertTrue(getQueryRunner().tableExists(getSession(), table4)); | |
| // CTAS from a materialized view should succeed | |
| assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table3, view), 255); | |
| assertTrue(getQueryRunner().tableExists(getSession(), table3)); | |
| // Verify CTAS-from-MV produces the expected subset of data | |
| assertQuery( | |
| format("SELECT * FROM %s", table3), | |
| format("SELECT * FROM %s WHERE ds = '2020-01-01'", view)); | |
| // Refresh the MV so it has data, then CTAS should read from the refreshed MV | |
| assertUpdate(format("REFRESH MATERIALIZED VIEW %s WHERE ds = '2020-01-01'", view), 255); | |
| assertUpdate(format("CREATE TABLE %s AS SELECT * FROM %s WHERE ds = '2020-01-01'", table4, view), 255); | |
| assertTrue(getQueryRunner().tableExists(getSession(), table4)); | |
| // Verify CTAS reads from the refreshed MV contents | |
| assertQuery( | |
| format("SELECT * FROM %s", table4), | |
| format("SELECT * FROM %s WHERE ds = '2020-01-01'", view)); |
bd01e1a to
38335f9
Compare
Summary: Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked when selecting from a materialized view. This was overly restrictive: - CTAS is always safe because the target table is new and cannot be a base table of the materialized view (no circular dependency possible). - INSERT is safe as long as the target table is not one of the MV's base tables. This change removes the blanket restriction and replaces it with a targeted circular dependency check: only INSERT into a base table of the materialized view is blocked, with a clear error message explaining why. Differential Revision: D94597203
| // Prevent INSERT when selecting from a materialized view into one of its base tables (circular dependency). | ||
| if (optionalMaterializedView.isPresent() && analysis.getStatement() instanceof Insert) { | ||
| Insert insert = (Insert) analysis.getStatement(); | ||
| QualifiedObjectName targetTable = createQualifiedObjectName(session, insert, insert.getTarget(), metadata); | ||
| SchemaTableName targetSchemaTable = new SchemaTableName(targetTable.getSchemaName(), targetTable.getObjectName()); | ||
| if (optionalMaterializedView.get().getBaseTables().contains(targetSchemaTable)) { | ||
| throw new SemanticException( | ||
| NOT_SUPPORTED, | ||
| table, | ||
| "INSERT into table %s by selecting from materialized view %s is not supported because %s is a base table of the materialized view", | ||
| targetTable, | ||
| optionalMaterializedView.get().getTable(), | ||
| targetTable); | ||
| } |
There was a problem hiding this comment.
@ceekay47 thank you for this change. The overall approach makes sense. One thing to note: currently in Iceberg connector, materialized views can be created on top of views, and views themselves can be built on other materialized views or views. Therefore, when identifying base tables that cannot be inserted into, this transitive relationship should be taken into account.
There was a problem hiding this comment.
@hantangwangd Thanks for the review! That's an interesting point. I believe the base tables are resolved transitively during materialized view creation () and stored in MaterializedViewDefinition. So I think this condition should handle the transitive relationship for Iceberg as well?
There was a problem hiding this comment.
@ceekay47 thanks for the explanation. Yes, you are correct that the base tables of the MV have already been properly resolved transitively. Would you mind adding a test case in TestIcebergMaterializedViewBase to show this scenario?
38335f9 to
b2b0f25
Compare
Summary: Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked when selecting from a materialized view. This was overly restrictive: - CTAS is always safe because the target table is new and cannot be a base table of the materialized view (no circular dependency possible). - INSERT is safe as long as the target table is not one of the MV's base tables. This change removes the blanket restriction and replaces it with a targeted circular dependency check: only INSERT into a base table of the materialized view is blocked, with a clear error message explaining why. Differential Revision: D94597203
Summary: Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked when selecting from a materialized view. This was overly restrictive: - CTAS is always safe because the target table is new and cannot be a base table of the materialized view (no circular dependency possible). - INSERT is safe as long as the target table is not one of the MV's base tables. This change removes the blanket restriction and replaces it with a targeted circular dependency check: only INSERT into a base table of the materialized view is blocked, with a clear error message explaining why. Differential Revision: D94597203
b2b0f25 to
c6bf465
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
@ceekay47 thanks for adding the tests, lgtm!
Summary:
Previously, CREATE TABLE AS SELECT and INSERT ... SELECT were blanket-blocked
when selecting from a materialized view. This was overly restrictive:
table of the materialized view (no circular dependency possible).
tables.
This change removes the blanket restriction and replaces it with a targeted
circular dependency check: only INSERT into a base table of the materialized
view is blocked, with a clear error message explaining why.
Differential Revision: D94597203
Summary by Sourcery
Allow CTAS and most INSERT ... SELECT operations from materialized views while preventing circular dependencies into their base tables.
Bug Fixes:
Enhancements:
Tests: