Conversation
567a739 to
d7ea68d
Compare
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/TypeAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/plan/SimplePlanRewriter.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/tests/TestLocalQueries.java
Outdated
Show resolved
Hide resolved
kasiafi
left a comment
There was a problem hiding this comment.
I have some thoughts on the ExpressionAnalyzer part. Please give me a chance to comment, I'll hopefully get back to it tomorrow.
d7ea68d to
10fee6e
Compare
|
Together with #11902 it yields following performance improvement: |
8c739ca to
b67f405
Compare
b67f405 to
0452821
Compare
There was a problem hiding this comment.
i don't see explicit copy here.
There was a problem hiding this comment.
"expressionRefs"?
(the name could be OK for a list, but that's a Map)
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
ExpressionAnalyzer is a ton of mutable state, it's totally insuitable for caching & sharing.
There was a problem hiding this comment.
That was proposed by @sopel39 . My previous version was actually caching only Map<RefNode<Expression>, Type>
There was a problem hiding this comment.
Type is immutable, so it was definitely a better idea
(not judging whether this particular cache in this place is required)
There was a problem hiding this comment.
ExpressionAnalyzer is a ton of mutable state, it's totally insuitable for caching & sharing.
I don't think it's a problem since it only adds new information to analysis. It already caches via io.trino.sql.analyzer.ExpressionAnalyzer.Visitor#process.
That was proposed by @sopel39 . My previous version was actually caching only Map<RefNode, Type>
IIRC that map was not passed to ExpressionAnalyzer hence it wouldn't work with sub-expressions
There was a problem hiding this comment.
Anyway, for me planning is a process so keeping ExpressionAnalyzer is not really an issue. Actually, it's probably desirable since I want to cache ExpressionInterpreter too and it needs to be able to fetch type information for new expressions.
There was a problem hiding this comment.
I strongly opt for caching Map<RefNode, Type>, as that's what you actually need here.
The ExpressionAnalyzer was not designed to be created in multiple instances throughout the Planner. It is intended for the Analysis phase, where it provides different kind of information to the Planner via the Analysis object. If you need to analyze expressions later for some reason, there's the analyzeExpressions method, and createConstantAnalyzer method, depending on your use case. Those methods give you access to the "analyzing" capability of the ExpressionAnalyzer while they hide the complexity which is not relevant at that point.
There was a problem hiding this comment.
The ExpressionAnalyzer was not designed to be created in multiple instances throughout the Planner.
@kasiafi This is exactly what is happening right now. Every io.trino.sql.planner.TypeAnalyzer#getTypes call creates a new ExpressionAnalyzer instance (via analyzeExpressions) and does a full analysis. Overall it's expensive if done repeatedly.
We want to reduce that cost by keeping ExpressionAnalyzer instance during planning, which should be fine since planning is a process rather than moving from one immutable state to the other.
I strongly opt for caching Map<RefNode, Type>, as that's what you actually need here.
I would like to keep ExpressionInterpreter too. For that I need a utility to return type for a (sub)expression rather than analyzing entire expression to get full Map<NodeRef<Expression>, Type> map. In that context keeping ExpressionAnalyzer is more natural.
findepi
left a comment
There was a problem hiding this comment.
"Cache expensive TypeAnalyzer instance creation and reduce type analys…
…is cost"
core/trino-main/src/main/java/io/trino/sql/planner/plan/SimplePlanRewriter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/TypeAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can you describe the meaning of the class's name?
core/trino-main/src/main/java/io/trino/sql/planner/TypeAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
List.of(expression) -> ImmutableList.of(expression)
There was a problem hiding this comment.
The ExpressionAnalyzer was not designed to be created in multiple instances throughout the Planner.
@kasiafi This is exactly what is happening right now. Every io.trino.sql.planner.TypeAnalyzer#getTypes call creates a new ExpressionAnalyzer instance (via analyzeExpressions) and does a full analysis. Overall it's expensive if done repeatedly.
We want to reduce that cost by keeping ExpressionAnalyzer instance during planning, which should be fine since planning is a process rather than moving from one immutable state to the other.
I strongly opt for caching Map<RefNode, Type>, as that's what you actually need here.
I would like to keep ExpressionInterpreter too. For that I need a utility to return type for a (sub)expression rather than analyzing entire expression to get full Map<NodeRef<Expression>, Type> map. In that context keeping ExpressionAnalyzer is more natural.
There was a problem hiding this comment.
IMO TypeAnalyzer should be created per query instance and possibly extracted as an interface. It should be then created and used in io.trino.sql.planner.LogicalPlanner#plan. No NonEvictableCache<CacheIdentityKey, ExpressionAnalyzer> analyzerCache is needed then
There was a problem hiding this comment.
We also need to add a case where IN list is in subexpression, e.g: x IS NULL or y IN (...)
There was a problem hiding this comment.
Just call ExpressionAnalyzer#analyze directly, no need to go through ExpressionAnalysis
Wouldn't caching
Type for a (sub)expression would be there in the cache.
Why? Is that to avoid re-evaluating constant expressions? We could cache |
Why would it be in cache? It can be a new instance of expression (e.g. symbol changes). Bringing new instance of
Yes. We do that repetitively in planning (e.g. with |
|
In my opinion, keeping the ExpressionAnalyzer is much more acceptable than creating and caching many ExpressionAnalyzers. However, I still consider it kind of "abstraction leak". The ExpressionAnalyzer is here to perform correctness checks, determine coercions, etc on the pre-planning phase. Computing types is kind of "implementation detail". We learned to reuse this capability throughout the Planner, because we found that we need to know the types over and over. The new IR should definitely be enhanced with types (and also with pre-computed constants). While we currently reuse the ExpressionAnalyzer for getting the expression types, we should try to isolate this capability from all the other work that ExpressionAnalyzer does. For that reason, we use the public methods rather than the constructor, which also involves irrelevant parts, like the Analysis. |
|
I'm OK with having a persistent (per query) ExpressionInterpreter with caching. |
Creating a new instance of InPredicate would cause expression type cache miss, which is using node reference as a cache key.
@kasiafi What does it mean in practice? You mean you would like to have something like |
Yes, that was my thinking. However, I still think that caching Expression -> Type should be sufficient, and occasional creation of new ExpressionAnalyzer is better than "pulling" it throughout the Optimizer. And to make it the most "occasional", we should consider preserving the NodeRefs whenever possible (mostly, in ExpressionInterpreter) instead of creating identical copies of Expressions. |
Why it would be better? You end up performing all these correctness checks anyway, so why to pretend we don't use
I don't like that approach mostly because |
|
I dismissed my review. @martint ptal if you happen to have time. |
0452821 to
d94e061
Compare
|
👋 @wendigo - this PR has become inactive. If you're still interested in working on it, please let us know. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |

Description
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: