Allow reading a stale materialized view by findepi · Pull Request #13484 · trinodb/trino

findepi · 2022-08-03T15:58:49Z

findepi · 2022-08-03T15:59:19Z

cc @alexjo2144 @raunaqmorarka @sopel39 @anjalinorwood @duoluodexiaokeke

sopel39 · 2022-08-03T16:12:20Z

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

    private boolean legacyCatalogRoles;
    private boolean incrementalHashArrayLoadFactorEnabled = true;
    private boolean allowSetViewAuthorization;
+    private Duration materializedViewRequiredFreshness = new Duration(0, SECONDS);


that most likely should be a MV property

It should definitely be allowed as an MV property as it is much easier to think about how much staleness is acceptable for specific MVs rather than defining it globally.
In fact, I would lean against making it configurable globally at all as it seems very tricky for an admin to come up with an acceptable staleness value for all MVs. It also creates the possibility of end users unexpectedly receiving stale results without having opted into it explicitly through MV property or session property.

Data will likely have different retention policy, therefore it has to be defined per MV

@sopel39 @raunaqmorarka @hashhar
mistakenly replied in a different (related) thread here: #13484 (comment)

anjalinorwood

This is a great enhancement. Thank you for adding it. LGTM.

raunaqmorarka

Please add tests as well

raunaqmorarka · 2022-08-04T05:34:24Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java


        String dependencies = sourceTableHandles.stream()
                .map(handle -> (IcebergTableHandle) handle)
-                .filter(handle -> handle.getSnapshotId().isPresent())


Question about this commit, if tables created by trino always have a snapshot id and CREATE MV cannot be pointed to pre-existing tables, then can this scenario arise in practise ?
Could we require that iceberg table always have a snapshot id here instead ?

Question about this commit, if tables created by trino always have a snapshot id and CREATE MV cannot be pointed to pre-existing tables, then can this scenario arise in practise ?

see 5c1750e#r80166651

got it, thanks
can we have a test which creates MV in trino on source iceberg table created by spark ?

We should (as part of #4832 ...)

raunaqmorarka · 2022-08-04T05:43:40Z

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

    private boolean legacyCatalogRoles;
    private boolean incrementalHashArrayLoadFactorEnabled = true;
    private boolean allowSetViewAuthorization;
+    private Duration materializedViewRequiredFreshness = new Duration(0, SECONDS);


It should definitely be allowed as an MV property as it is much easier to think about how much staleness is acceptable for specific MVs rather than defining it globally.
In fact, I would lean against making it configurable globally at all as it seems very tricky for an admin to come up with an acceptable staleness value for all MVs. It also creates the possibility of end users unexpectedly receiving stale results without having opted into it explicitly through MV property or session property.

raunaqmorarka · 2022-08-04T05:48:34Z

core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java

            if (optionalMaterializedView.isPresent()) {
-                if (metadata.getMaterializedViewFreshness(session, name).isMaterializedViewFresh()) {
-                    // If materialized view is current, answer the query using the storage table
+                if (isMaterializedViewSufficientlyFresh(name)) {


I'm wondering if we should let this be an implementation detail of the the connector's metadata.getMaterializedViewFreshness code. Iceberg MV code could figure out the "sufficiently fresh" condition and indicate freshness to the engine through existing boolean flag.

We need to differentiate two use-cases

staleness for explicit querying

i didn't yet grasp why we need this, but Trino MVs accept no staleness today, unlike eg PostgreSQL's or Oracle's. In those systems, the user has control over when the view is refreshed (as in Trino), so it feels OK to query the MV directly if the user asks to, irrespective of the staleness. We chose hybrid approach, but -- for performance and cost control reasons -- users want to query stale MVs directly.
i think it's fine to have this configurable on per view basis, but I expect we will later need to add session-level controls (overrides) as well.
The question is how to we model this for per-MV. I can easily make this a connector property, but this is something we will want to define for all connectors, which calls for some unification (syntax?).
If we agree that we want to have both per-MV and per-session controls, does it matter which one we start with?

staleness for query rewrites

this one is more tricky. I believe our MVs' "fresh or inline" philosophy was picked with query rewrites as the primary use-case (so departing from apparent experience of other systems). The query rewrites will also want to accept certain staleness, but this use-case is much more delicate, as the logic is implicit. I am sure we will want per-MV and per-session controls for this feature when we build it (but building it is not a goal for this PR)

raunaqmorarka · 2022-08-04T05:49:23Z

core/trino-spi/src/main/java/io/trino/spi/connector/MaterializedViewFreshness.java

+
    @Override
-    public boolean equals(Object obj)
+    public boolean equals(Object o)


Can we skip variable rename or make it separate commit ?

raunaqmorarka · 2022-08-04T05:49:46Z

core/trino-spi/src/main/java/io/trino/spi/connector/MaterializedViewFreshness.java

-        sb.append("materializedViewFresh=").append(materializedViewFresh);
-        sb.append('}');
-        return sb.toString();
+        return new StringJoiner(", ", MaterializedViewFreshness.class.getSimpleName() + "[", "]")


StringBuilder -> StringJoiner change in separate commit ?

martint · 2022-08-04T06:05:23Z

I’m not sure a global setting for this is appropriate. There’s usually not a one-size-fits-all value that captures the requirements for different use cases.

Also, it’s a departure from the model we’ve been working towards, which is that materialized views have the semantics of a view. I.e., always fresh, either because it’s already up to date, refreshed on the fly, or inlined and computed like a normal view. We also discussed possible syntax extensions to allow a user to indicate how much staleness they are willing to tolerate on a case by case basis.

findepi · 2022-08-04T08:27:49Z

model we’ve been working towards, which is that materialized views have the semantics of a view. I.e., always fresh, either because it’s already up to date, refreshed on the fly, or inlined and computed like a normal view.

This is the ideal situation from query engine perspective, but it's too strict in practice for end-users.
It's OK for a user to have eg a daily-refreshed view, and accept a day of staleness, to avoid expensive computations.

For example, PostgreSQL materialized view feature https://www.postgresql.org/docs/14/rules-materializedviews.html started off from a different angle: one has explicit refreshes and implicitly accepts any amount of staleness. While our fresh-or-inline approach looks nicer, I don't think PostgreSQL's approach is absurd either. Both have their strengths and weaknesses. BTW PostgreSQL's approach seems to be directionally aligned with Oracle's.

sopel39 · 2022-08-04T12:28:35Z

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

    }

+    @NotNull
+    public Duration getMaterializedViewRequiredFreshness()


This is very much property of data, not a system-wide property. I also think MV refresh scheduler should be outside of core engine and engine should only be responsible for executing REFRESH or SELECT queries.

I don't think that engine needs to know freshness duration if it cannot do much about if without a scheduler.

High level question is:

Do we need resolution as MV as view at all?

If no, then the whole concept of freshness doesn't make sense.

If yes, then IMO it's the connector that should be responsible for business (time-based) logic whether MV is fresh or not. For example MV freshness might be tightly coupled with refresh interval and MV can be refreshed based on regex (e.g. every day on midnight).

findepi · 2022-08-24T12:35:04Z

(just rebased)

findepi · 2022-09-17T16:02:01Z

@romanvainb will follow up sooner or later

cla-bot bot added the cla-signed label Aug 3, 2022

findepi mentioned this pull request Aug 3, 2022

Allow reading from a materialized view (storage table) when view is stale (not fresh) #10606

Open

findepi added the enhancement New feature or request label Aug 3, 2022

sopel39 reviewed Aug 3, 2022

View reviewed changes

anjalinorwood approved these changes Aug 3, 2022

View reviewed changes

raunaqmorarka reviewed Aug 4, 2022

View reviewed changes

sopel39 reviewed Aug 4, 2022

View reviewed changes

findepi mentioned this pull request Aug 9, 2022

Small MV improvements #13574

Merged

findepi self-assigned this Aug 12, 2022

Allow reading a stale materialized view

f8d0177

findepi force-pushed the findepi/read-stale-mv branch from ef0055a to f8d0177 Compare August 24, 2022 12:34

findepi closed this Sep 17, 2022

findepi deleted the findepi/read-stale-mv branch September 17, 2022 16:02

Conversation

findepi commented Aug 3, 2022

Uh oh!

findepi commented Aug 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anjalinorwood left a comment

Choose a reason for hiding this comment

Uh oh!

raunaqmorarka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raunaqmorarka Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martint commented Aug 4, 2022

Uh oh!

findepi commented Aug 4, 2022

Uh oh!

sopel39 Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commented Aug 24, 2022

Uh oh!

findepi commented Sep 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

raunaqmorarka Aug 4, 2022 •

edited

Loading

sopel39 Aug 4, 2022 •

edited

Loading