feat(optimizer): Add support for configurable freshness thresholds for materialized views#26764
Conversation
Reviewer's GuideAdds configurable staleness handling for materialized views, including new SPI types, config and session properties, connector- and engine-level plumbing (notably for Iceberg and memory connectors), optimizer changes to respect staleness behavior and window, and tests/docs for the new behavior. Sequence diagram for materialized view rewrite with staleness handlingsequenceDiagram
actor User
participant Session
participant Planner as MaterializedViewRewrite
participant Metadata as MetadataResolver
participant Connector as ConnectorMetadata
User->>Session: submit query referencing MV
Session->>Planner: optimize plan
Planner->>Metadata: getMaterializedView(viewName)
Metadata->>Connector: getMaterializedView(viewName)
Connector-->>Metadata: MaterializedViewDefinition(definition)
Metadata-->>Planner: MaterializedViewDefinition(definition)
Planner->>Metadata: getMaterializedViewStatus(viewName,TupleDomain.all)
Metadata->>Connector: getMaterializedViewStatus(viewName,TupleDomain.all)
Connector-->>Metadata: MaterializedViewStatus(status,lastFreshTime)
Metadata-->>Planner: MaterializedViewStatus(status,lastFreshTime)
alt fully materialized
Planner->>Planner: canUseDataTableWithSecurityChecks(definition,status)
Planner-->>Session: use data table
else not fully materialized and stalenessConfig present
Planner->>Planner: isStalenessBeyondTolerance(stalenessConfig,status)
alt within stalenessWindow
Planner->>Planner: canUseDataTableWithSecurityChecks(definition,status)
Planner-->>Session: use data table
else beyond stalenessWindow
alt staleReadBehavior == FAIL
Planner-->>User: error MATERIALIZED_VIEW_STALE
else staleReadBehavior == USE_VIEW_QUERY
Planner-->>Session: do not use data table
end
end
else no stalenessConfig
Planner->>Session: getMaterializedViewStaleReadBehavior
alt sessionBehavior == USE_VIEW_QUERY
Planner-->>Session: do not use data table
else sessionBehavior == FAIL
Planner-->>User: error MATERIALIZED_VIEW_STALE
end
end
Updated class diagram for materialized view staleness modelclassDiagram
class MaterializedViewDefinition {
- String originalSql
- String schema
- String table
- List~SchemaTableName~ baseTables
- Optional~String~ owner
- Optional~ViewSecurity~ securityMode
- List~ColumnMapping~ columnMappings
- List~SchemaTableName~ baseTablesOnOuterJoinSide
- Optional~List~ validRefreshColumns
- Optional~MaterializedViewStalenessConfig~ stalenessConfig
- Optional~MaterializedViewRefreshType~ refreshType
+ MaterializedViewDefinition(originalSql,schema,table,baseTables,owner,securityMode,columnMappings,baseTablesOnOuterJoinSide,validRefreshColumns,stalenessConfig,refreshType)
+ Optional~MaterializedViewStalenessConfig~ getStalenessConfig()
+ Optional~MaterializedViewRefreshType~ getRefreshType()
}
class MaterializedViewStalenessConfig {
- MaterializedViewStaleReadBehavior staleReadBehavior
- Duration stalenessWindow
+ MaterializedViewStalenessConfig(staleReadBehavior,stalenessWindow)
+ MaterializedViewStaleReadBehavior getStaleReadBehavior()
+ Duration getStalenessWindow()
}
class MaterializedViewStatus {
- MaterializedViewState materializedViewState
- Map~SchemaTableName,MaterializedDataPredicates~ partitionsFromBaseTables
- Optional~Long~ lastFreshTime
+ MaterializedViewStatus(materializedViewState)
+ MaterializedViewStatus(materializedViewState,partitionsFromBaseTables)
+ MaterializedViewStatus(materializedViewState,partitionsFromBaseTables,lastFreshTime)
+ MaterializedViewState getMaterializedViewState()
+ Map~SchemaTableName,MaterializedDataPredicates~ getPartitionsFromBaseTables()
+ Optional~Long~ getLastFreshTime()
}
class MaterializedViewStaleReadBehavior {
<<enum>>
FAIL
USE_VIEW_QUERY
}
class MaterializedViewRefreshType {
<<enum>>
FULL
}
class FeaturesConfig {
- MaterializedViewStaleReadBehavior materializedViewStaleReadBehavior
+ MaterializedViewStaleReadBehavior getMaterializedViewStaleReadBehavior()
+ FeaturesConfig setMaterializedViewStaleReadBehavior(materializedViewStaleReadBehavior)
}
class SystemSessionProperties {
+ static MaterializedViewStaleReadBehavior getMaterializedViewStaleReadBehavior(Session)
}
class IcebergTableProperties {
+ static Optional~MaterializedViewStaleReadBehavior~ getMaterializedViewStaleReadBehavior(Map)
+ static Optional~Duration~ getMaterializedViewStalenessWindow(Map)
+ static MaterializedViewRefreshType getMaterializedViewRefreshType(Map)
+ List~PropertyMetadata~ getMaterializedViewProperties()
}
class MaterializedViewRewrite {
- boolean isUseDataTable(MaterializedViewScanNode,MetadataResolver,Session)
- boolean isStalenessBeyondTolerance(MaterializedViewStalenessConfig,MaterializedViewStatus)
- boolean applyStaleReadBehavior(MaterializedViewStalenessConfig,QualifiedObjectName)
- boolean canUseDataTableWithSecurityChecks(MaterializedViewScanNode,MetadataResolver,Session,MaterializedViewDefinition)
}
MaterializedViewDefinition --> "0..1" MaterializedViewStalenessConfig : stalenessConfig
MaterializedViewDefinition --> "0..1" MaterializedViewRefreshType : refreshType
MaterializedViewStalenessConfig --> MaterializedViewStaleReadBehavior : uses
MaterializedViewStatus --> "0..1" Long : lastFreshTime
FeaturesConfig --> MaterializedViewStaleReadBehavior : defaultBehavior
SystemSessionProperties --> MaterializedViewStaleReadBehavior : sessionBehavior
IcebergTableProperties --> MaterializedViewStaleReadBehavior : tableProperty
IcebergTableProperties --> MaterializedViewRefreshType : tableProperty
MaterializedViewRewrite --> MaterializedViewDefinition : reads
MaterializedViewRewrite --> MaterializedViewStatus : reads
MaterializedViewRewrite --> MaterializedViewStalenessConfig : evaluates
MaterializedViewRewrite --> MaterializedViewStaleReadBehavior : applies
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
d9f4b9a to
aefcddf
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The staleness evaluation in MaterializedViewRewrite uses currentTimeMillis() directly, which makes behavior time-dependent and slightly non-deterministic in tests; consider injecting a clock or passing the evaluation time via MaterializedViewStatus to make the logic and tests fully deterministic.
- Several new tests in TestMaterializedViewRewrite appear to cover overlapping scenarios (e.g., the two 'within tolerance' and two 'beyond tolerance' cases); you could consolidate or parameterize these to reduce duplication while keeping the coverage.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The staleness evaluation in MaterializedViewRewrite uses currentTimeMillis() directly, which makes behavior time-dependent and slightly non-deterministic in tests; consider injecting a clock or passing the evaluation time via MaterializedViewStatus to make the logic and tests fully deterministic.
- Several new tests in TestMaterializedViewRewrite appear to cover overlapping scenarios (e.g., the two 'within tolerance' and two 'beyond tolerance' cases); you could consolidate or parameterize these to reduce duplication while keeping the coverage.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:1718-1723` </location>
<code_context>
return new MaterializedViewStatus(NOT_MATERIALIZED, ImmutableMap.of());
}
+ Optional<Long> lastFreshTime = definition.get().getBaseTables().stream()
+ .map(baseTable -> getIcebergTable(session, baseTable))
+ .map(Table::currentSnapshot)
+ .filter(Objects::nonNull)
+ .map(Snapshot::timestampMillis)
+ .max(Long::compareTo);
+
for (SchemaTableName baseTable : definition.get().getBaseTables()) {
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid recomputing Iceberg tables twice when determining status and last fresh time.
`lastFreshTime` is computed by calling `getIcebergTable(session, baseTable)` for every base table, and then the `for` loop calls `getIcebergTable` again for the same tables. This doubles catalog/metadata access. Either derive `lastFreshTime` from the `baseIcebergTable` already fetched in the loop, or build a single map of base table → snapshot and reuse it in both places to avoid repeated lookups.
Suggested implementation:
```java
Map<SchemaTableName, Table> baseIcebergTables = new HashMap<>();
for (SchemaTableName baseTable : definition.get().getBaseTables()) {
baseIcebergTables.put(baseTable, getIcebergTable(session, baseTable));
}
Optional<Long> lastFreshTime = baseIcebergTables.values().stream()
.map(Table::currentSnapshot)
.filter(Objects::nonNull)
.map(Snapshot::timestampMillis)
.max(Long::compareTo);
for (SchemaTableName baseTable : definition.get().getBaseTables()) {
Table baseIcebergTable = baseIcebergTables.get(baseTable);
```
1. Ensure the following imports exist at the top of `IcebergAbstractMetadata.java` (or equivalent, depending on your existing imports):
- `import java.util.HashMap;`
- `import java.util.Map;`
2. If the file already has these imports under a different grouping/order, integrate them according to your existing import style.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java:225-234` </location>
<code_context>
+ assertThrows(PrestoException.class, () ->
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen failure tests by asserting on error code/message instead of only exception type
In the failure tests (`testFailWhenStaleAndSessionPropertyIsFail`, `testFailWhenStalenessBeyondToleranceWithFailBehavior`, `testFailWhenNeverRefreshedWithStalenessConfig`, `testFailWhenStalenessBeyondToleranceWindow`), consider using the `assertThrows` overload that returns the exception and then assert on `getErrorCode()` (and optionally the message) so the tests verify the specific `MATERIALIZED_VIEW_STALE` / `INVALID_MATERIALIZED_VIEW_PROPERTY` failures, not just the exception type.
Suggested implementation:
```java
Metadata metadata = new TestingMetadataWithMaterializedViewStatus(false);
PrestoException exception = assertThrows(PrestoException.class, () ->
testerWithFail.assertThat(new MaterializedViewRewrite(metadata, new AllowAllAccessControl()))
.on(planBuilder -> {
VariableReferenceExpression outputA = planBuilder.variable("a", BIGINT);
VariableReferenceExpression dataTableA = planBuilder.variable("data_table_a", BIGINT);
VariableReferenceExpression viewQueryA = planBuilder.variable("view_query_a", BIGINT);
return planBuilder.materializedViewScan(
materializedViewName,
planBuilder.values(dataTableA),
planBuilder.values(viewQueryA),
```
1. After this block (in the same test method), add:
`assertEquals(MATERIALIZED_VIEW_STALE.toErrorCode(), exception.getErrorCode());`
(or `INVALID_MATERIALIZED_VIEW_PROPERTY.toErrorCode()` where appropriate).
2. Ensure you have `import static org.testng.Assert.assertEquals;` at the top of the file.
3. Ensure you have static imports for the error codes you want to assert, e.g.:
`import static com.facebook.presto.spi.StandardErrorCode.MATERIALIZED_VIEW_STALE;`
and/or
`import static com.facebook.presto.spi.StandardErrorCode.INVALID_MATERIALIZED_VIEW_PROPERTY;`.
4. Apply the same pattern to:
- `testFailWhenStaleAndSessionPropertyIsFail`
- `testFailWhenStalenessBeyondToleranceWithFailBehavior`
- `testFailWhenNeverRefreshedWithStalenessConfig`
- `testFailWhenStalenessBeyondToleranceWindow`
by:
- Capturing the thrown `PrestoException` into a local variable.
- Asserting `getErrorCode()` matches the specific expected error code.
- Optionally asserting on `getMessage()` if you want stronger guarantees about the failure reason.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/test/java/com/facebook/presto/execution/TestCreateMaterializedViewTask.java:128-133` </location>
<code_context>
ColumnPropertyManager columnPropertyManager = new ColumnPropertyManager();
columnPropertyManager.addProperties(testCatalog.getConnectorId(), ImmutableList.of());
+ MaterializedViewPropertyManager materializedViewPropertyManager = new MaterializedViewPropertyManager();
+ materializedViewPropertyManager.addProperties(testCatalog.getConnectorId(), ImmutableList.of());
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that exercises materialized view creation with the new MV properties
`MaterializedViewPropertyManager` is now wired into `CreateMaterializedViewTask`, but tests here still only validate table properties. Please add a test that creates a materialized view with MV-specific properties (e.g., `materialized_view_stale_read_behavior`, `materialized_view_staleness_window`, `materialized_view_refresh_type`) set in SQL, and asserts they are correctly propagated into `ConnectorTableMetadata` (or the stored definition).
</issue_to_address>
### Comment 4
<location> `presto-docs/src/main/sphinx/admin/materialized-views.rst:66` </location>
<code_context>
+
+When no per-view configuration is specified, the default behavior is ``FAIL`` (query fails with
+an error). This can be changed using the ``materialized_view_stale_read_behavior`` session property
+or the ``materialized-view-stale-read-behavior`` configuration property. Setting to ``USE_VIEW_QUERY``
+causes Presto to fall back to executing the underlying view query against the base tables.
+
Required Permissions
</code_context>
<issue_to_address>
**suggestion (typo):** Add a missing word in "Setting to ``USE_VIEW_QUERY``` for correct grammar.
The phrase is missing an object. Please rephrase to something like “Setting it to ``USE_VIEW_QUERY```” or “Setting this to ``USE_VIEW_QUERY```” for correct grammar.
```suggestion
or the ``materialized-view-stale-read-behavior`` configuration property. Setting it to ``USE_VIEW_QUERY``
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
...rc/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java
Outdated
Show resolved
Hide resolved
...to-main-base/src/test/java/com/facebook/presto/execution/TestCreateMaterializedViewTask.java
Show resolved
Hide resolved
6028a0b to
20869b1
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
@tdcmeehan thanks for this feature, overall lgtm. I've left a few comments—mostly small things.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergTableProperties.java
Outdated
Show resolved
Hide resolved
...se/src/main/java/com/facebook/presto/sql/planner/iterative/rule/MaterializedViewRewrite.java
Show resolved
Hide resolved
ec48317 to
141b410
Compare
141b410 to
989623f
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for the fix, looks great, just a couple of nits left.
Besides, is it possible to add some e2e test cases for the staleness window behavior in some place like TestIcebergMaterializedViews?
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java
Outdated
Show resolved
Hide resolved
...o-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewMetadata.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @hantangwangd, I've added more tests. |
hantangwangd
left a comment
There was a problem hiding this comment.
@tdcmeehan thanks for the tests, LGTM! One last thing left about the default value in the docs.
|
Thanks @hantangwangd for the sharp eye, updated the docs as well! |
hantangwangd
left a comment
There was a problem hiding this comment.
Looks great. Thanks for the update.
Description
Add configurable freshness thresholds for materialized views. Users can specify a staleness window and behavior (
FAILorUSE_VIEW_QUERY) when the view is stale beyond the threshold.For Iceberg, materialized view properties are added which allow toggling both the stale threshold and the behavior:
CREATE MATERIALIZED VIEW hourly_sales WITH ( stale_read_behavior = 'FAIL', staleness_window = '1h' ) AS SELECT date_trunc('hour', sale_time) as hour, SUM(amount) as total FROM sales GROUP BY 1;Motivation and Context
Materialized views can become stale when base tables change. This feature allows users to configure how Presto handles stale reads - either fail with an error or fall back to querying base tables directly.
Impact
Support added for configurable freshness thresholds
Test Plan
FAIL/USE_VIEW_QUERYbehaviorsContributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.