Conversation
| private boolean legacyCatalogRoles; | ||
| private boolean incrementalHashArrayLoadFactorEnabled = true; | ||
| private boolean allowSetViewAuthorization; | ||
| private Duration materializedViewRequiredFreshness = new Duration(0, SECONDS); |
There was a problem hiding this comment.
that most likely should be a MV property
There was a problem hiding this comment.
It should definitely be allowed as an MV property as it is much easier to think about how much staleness is acceptable for specific MVs rather than defining it globally.
In fact, I would lean against making it configurable globally at all as it seems very tricky for an admin to come up with an acceptable staleness value for all MVs. It also creates the possibility of end users unexpectedly receiving stale results without having opted into it explicitly through MV property or session property.
There was a problem hiding this comment.
Data will likely have different retention policy, therefore it has to be defined per MV
There was a problem hiding this comment.
@sopel39 @raunaqmorarka @hashhar
mistakenly replied in a different (related) thread here: #13484 (comment)
anjalinorwood
left a comment
There was a problem hiding this comment.
This is a great enhancement. Thank you for adding it. LGTM.
raunaqmorarka
left a comment
There was a problem hiding this comment.
Please add tests as well
|
|
||
| String dependencies = sourceTableHandles.stream() | ||
| .map(handle -> (IcebergTableHandle) handle) | ||
| .filter(handle -> handle.getSnapshotId().isPresent()) |
There was a problem hiding this comment.
Question about this commit, if tables created by trino always have a snapshot id and CREATE MV cannot be pointed to pre-existing tables, then can this scenario arise in practise ?
Could we require that iceberg table always have a snapshot id here instead ?
There was a problem hiding this comment.
Question about this commit, if tables created by trino always have a snapshot id and CREATE MV cannot be pointed to pre-existing tables, then can this scenario arise in practise ?
There was a problem hiding this comment.
got it, thanks
can we have a test which creates MV in trino on source iceberg table created by spark ?
| private boolean legacyCatalogRoles; | ||
| private boolean incrementalHashArrayLoadFactorEnabled = true; | ||
| private boolean allowSetViewAuthorization; | ||
| private Duration materializedViewRequiredFreshness = new Duration(0, SECONDS); |
There was a problem hiding this comment.
It should definitely be allowed as an MV property as it is much easier to think about how much staleness is acceptable for specific MVs rather than defining it globally.
In fact, I would lean against making it configurable globally at all as it seems very tricky for an admin to come up with an acceptable staleness value for all MVs. It also creates the possibility of end users unexpectedly receiving stale results without having opted into it explicitly through MV property or session property.
| if (optionalMaterializedView.isPresent()) { | ||
| if (metadata.getMaterializedViewFreshness(session, name).isMaterializedViewFresh()) { | ||
| // If materialized view is current, answer the query using the storage table | ||
| if (isMaterializedViewSufficientlyFresh(name)) { |
There was a problem hiding this comment.
I'm wondering if we should let this be an implementation detail of the the connector's metadata.getMaterializedViewFreshness code. Iceberg MV code could figure out the "sufficiently fresh" condition and indicate freshness to the engine through existing boolean flag.
There was a problem hiding this comment.
We need to differentiate two use-cases
- staleness for explicit querying
- i didn't yet grasp why we need this, but Trino MVs accept no staleness today, unlike eg PostgreSQL's or Oracle's. In those systems, the user has control over when the view is refreshed (as in Trino), so it feels OK to query the MV directly if the user asks to, irrespective of the staleness. We chose hybrid approach, but -- for performance and cost control reasons -- users want to query stale MVs directly.
i think it's fine to have this configurable on per view basis, but I expect we will later need to add session-level controls (overrides) as well.
The question is how to we model this for per-MV. I can easily make this a connector property, but this is something we will want to define for all connectors, which calls for some unification (syntax?).
If we agree that we want to have both per-MV and per-session controls, does it matter which one we start with?
- i didn't yet grasp why we need this, but Trino MVs accept no staleness today, unlike eg PostgreSQL's or Oracle's. In those systems, the user has control over when the view is refreshed (as in Trino), so it feels OK to query the MV directly if the user asks to, irrespective of the staleness. We chose hybrid approach, but -- for performance and cost control reasons -- users want to query stale MVs directly.
- staleness for query rewrites
- this one is more tricky. I believe our MVs' "fresh or inline" philosophy was picked with query rewrites as the primary use-case (so departing from apparent experience of other systems). The query rewrites will also want to accept certain staleness, but this use-case is much more delicate, as the logic is implicit. I am sure we will want per-MV and per-session controls for this feature when we build it (but building it is not a goal for this PR)
|
|
||
| @Override | ||
| public boolean equals(Object obj) | ||
| public boolean equals(Object o) |
There was a problem hiding this comment.
Can we skip variable rename or make it separate commit ?
| sb.append("materializedViewFresh=").append(materializedViewFresh); | ||
| sb.append('}'); | ||
| return sb.toString(); | ||
| return new StringJoiner(", ", MaterializedViewFreshness.class.getSimpleName() + "[", "]") |
There was a problem hiding this comment.
StringBuilder -> StringJoiner change in separate commit ?
|
I’m not sure a global setting for this is appropriate. There’s usually not a one-size-fits-all value that captures the requirements for different use cases. Also, it’s a departure from the model we’ve been working towards, which is that materialized views have the semantics of a view. I.e., always fresh, either because it’s already up to date, refreshed on the fly, or inlined and computed like a normal view. We also discussed possible syntax extensions to allow a user to indicate how much staleness they are willing to tolerate on a case by case basis. |
This is the ideal situation from query engine perspective, but it's too strict in practice for end-users. For example, PostgreSQL materialized view feature https://www.postgresql.org/docs/14/rules-materializedviews.html started off from a different angle: one has explicit refreshes and implicitly accepts any amount of staleness. While our fresh-or-inline approach looks nicer, I don't think PostgreSQL's approach is absurd either. Both have their strengths and weaknesses. BTW PostgreSQL's approach seems to be directionally aligned with Oracle's. |
| } | ||
|
|
||
| @NotNull | ||
| public Duration getMaterializedViewRequiredFreshness() |
There was a problem hiding this comment.
This is very much property of data, not a system-wide property. I also think MV refresh scheduler should be outside of core engine and engine should only be responsible for executing REFRESH or SELECT queries.
I don't think that engine needs to know freshness duration if it cannot do much about if without a scheduler.
High level question is:
- Do we need resolution as MV as view at all?
- If no, then the whole concept of freshness doesn't make sense.
- If yes, then IMO it's the connector that should be responsible for business (time-based) logic whether MV is fresh or not. For example MV freshness might be tightly coupled with refresh interval and MV can be refreshed based on regex (e.g. every day on midnight).
ef0055a to
f8d0177
Compare
|
(just rebased) |
|
@romanvainb will follow up sooner or later |
Fixes #10606