Skip to content

Detect materialized views with non-deterministic functions as stale#28682

Merged
hashhar merged 3 commits intomasterfrom
user/hashhar/fix-mv-nondeterminism
Mar 30, 2026
Merged

Detect materialized views with non-deterministic functions as stale#28682
hashhar merged 3 commits intomasterfrom
user/hashhar/fix-mv-nondeterminism

Conversation

@hashhar
Copy link
Copy Markdown
Member

@hashhar hashhar commented Mar 17, 2026

Description

Materialized views using non-deterministic scalar functions like current_timestamp, current_date, or random() were not detected as stale because the refresh only tracks table function dependencies.

Add a hasNonDeterministicFunctions flag that propagates through the refresh. When set the MV freshness is reported as UNKNOWN, which causes re-execution after the grace period expires to match the existing behavior for table functions.

It covers both current-time AST nodes (CurrentDate, CurrentTime, CurrentTimestamp, LocalTime, LocalTimestamp) and non-deterministic functions via analysis.getResolvedFunctions().

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

Submitted from repo to ensure all tests are run.

@cla-bot cla-bot bot added the cla-signed label Mar 17, 2026
@github-actions github-actions bot added iceberg Iceberg connector lakehouse labels Mar 17, 2026
Comment thread core/trino-main/src/main/java/io/trino/sql/planner/LogicalPlanner.java Outdated
@hashhar hashhar force-pushed the user/hashhar/fix-mv-nondeterminism branch from d7249e7 to fd3e300 Compare March 17, 2026 07:05
@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 17, 2026

Materialized views using non-deterministic scalar functions like current_timestamp, current_date, or random() were not detected as stale because the refresh only tracks table function dependencies.

This is not necessarily a bug.

There is some usefulness to having e.g. current_timestamp AS refresh_time column in a MV or using random() for sampling the source.

If we consider every MV with such functions to be stale, there is no point in producing materialization. It should never be used1. So we should rather forbid creation of such MV.

cc @martint

Footnotes

  1. except during grace period, but that seems rather odd to me.

@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 17, 2026

that might make sense, I think at a minimum the 2nd commit needs to be there to avoid incorrectly computing an incremental refresh.

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 17, 2026

that might make sense, I think at a minimum the 2nd commit needs to be there to avoid incorrectly computing an incremental refresh.

That matches my intuition too, but I'd rather clarify desired semantics first, before making the final call.

@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 17, 2026

Sure, I'll discuss with @martint

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 17, 2026

Fixes #22533, #28696

  1. this will close only the first of two
  2. use - to make it a list. GH renders link previews within list items.

@piotrrzysko
Copy link
Copy Markdown
Member

Fixes #22533, #28696

  1. this will close only the first of two

I think this PR addresses both. We should create a third issue to track optimization for incremental refreshes -- detecting whether materialized data is valid despite non-deterministic expressions.

@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 18, 2026

Fixes #22533, #28696

  1. this will close only the first of two

I think this PR addresses both. We should create a third issue to track optimization for incremental refreshes -- detecting whether materialized data is valid despite non-deterministic expressions.

see #28731

@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 18, 2026

@martint Looks like a pre-existing bug too about session-scoped things.

I see that current_user (and other session-scoped expressions) are stored as literal SQL text in the MV definition (

String sql = getFormattedSql(statement.getQuery(), sqlParser);
), not
replaced with constants at creation time. This creates an inconsistency in what they resolve to depending on the code path:

The refresh path is the wrong one. It probably should also create a view session with the owner's identity (like the inline path does).

@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 18, 2026

I think this PR addresses both.

but still doesn't close both, per #28682 (comment)

@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 18, 2026

I extracted the REFRESH and session-scoped issues into #28738, WILL NOT ADDRESS them as part of this PR. This PR is focused on the correctness issue + staleness logic.

@hashhar hashhar force-pushed the user/hashhar/fix-mv-nondeterminism branch 2 times, most recently from d5ca28c to 5404a6e Compare March 18, 2026 09:27
@hashhar hashhar requested a review from martint March 18, 2026 19:02
@martint
Copy link
Copy Markdown
Member

martint commented Mar 26, 2026

The planner changes look good. I haven't reviewed the tests.

hashhar added 3 commits March 30, 2026 19:13
Materialized views using non-deterministic scalar functions like
current_timestamp, current_date, or random() were not detected as stale
because the refresh only tracks table function dependencies.

Add a hasNonDeterministicFunctions flag that propagates through the
refresh. When set the MV freshness is reported as UNKNOWN, which causes
re-execution after the grace period expires to match the existing
behavior for table functions.

It covers both current-time AST nodes (CurrentDate, CurrentTime,
CurrentTimestamp, LocalTime, LocalTimestamp) and non-deterministic
functions via analysis.getResolvedFunctions().
Incremental refresh only scans newly appended rows from the source
table. For MVs with non-deterministic functions like current_timestamp
in time-based predicates this means old rows that fall outside a
shifted time window are never removed.

We now force full refresh when non-deterministic functions are detected.
@hashhar hashhar force-pushed the user/hashhar/fix-mv-nondeterminism branch from 5404a6e to 081e637 Compare March 30, 2026 13:44
@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 30, 2026

Rebased to resolved conflicts. @findepi / @martint can you PTAL at the tests so that we can merge this and move on to the DEFINER semantics bug.

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed the tests, lgtm.
Solid test coverage, good job! I have not verified every single assertion though.

Comment on lines +1039 to +1043
// TODO https://github.com/trinodb/trino/issues/28738 session-scoped expressions should resolve using
// the MV owner's identity during refresh (like the stale inline path does via analyzeView), but currently
// the refresh path resolves them from the refreshing user's session.
// When fixed, this should be: VALUES ('user'), ('user')
assertQuery("SELECT created_by FROM " + mvName, "VALUES ('user'), ('other_user')");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar
Copy link
Copy Markdown
Member Author

hashhar commented Mar 30, 2026

hit #23537 before retries

@hashhar hashhar merged commit 04e1836 into master Mar 30, 2026
212 of 215 checks passed
@hashhar hashhar deleted the user/hashhar/fix-mv-nondeterminism branch March 30, 2026 17:22
@github-actions github-actions bot added this to the 481 milestone Mar 30, 2026
@ebyhr ebyhr mentioned this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants