Cache ExpressionInterpreter optimizations by gaurav8297 · Pull Request #12016 · trinodb/trino

gaurav8297 · 2022-04-19T12:41:02Z

Description

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Problem: ExpressionInterpreter is called multiple times during planning. Hence its optimization results should be cached (especially for long IN lists)

This PR introduces the following things

The first two commits introduce a way to utilize the same instance of ExpressionInterpreter for a query. This way query level expression caching is possible without maintaining a global cache. This is based on Karol's comment: Cache ExpressionInterpreter optimizations #12016 (comment)
Caching optimization results in ExpressionInterpreter whose instance is unique at query level.

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

findepi · 2022-04-19T14:44:41Z

@martint @kasiafi @findepi PTAL

gaurav8297 · 2022-04-19T15:47:38Z

Just fyi, this pr isn't ready yet

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

gaurav8297 · 2022-04-20T08:23:36Z

@sopel39 PTAL

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

sopel39 · 2022-04-20T09:50:03Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

nit: I'm not sure this cache is even used in practice TBH

Yes, that's true. This cache isn't used in planLargeInQuery benchmark.

Also, this cache only gets used when the InPredicate value isn't an expression which seems like a very rare use case. Please correct me if I'm wrong

Seems so and seems to have always been so since 5c8b118.
Maybe we should remove the io.trino.sql.planner.ExpressionInterpreter#inListCache? cc @martint

Maybe we should remove the io.trino.sql.planner.ExpressionInterpreter#inListCache?

Yes, I think we should do that

gaurav8297 · 2022-04-21T08:46:37Z

@sopel39 @raunaqmorarka PTAL

gaurav8297 · 2022-04-22T06:14:04Z

Looking at the failing tests

wendigo · 2022-04-22T12:42:31Z

@gaurav8297 do you have any benchmarks for this change?

gaurav8297 · 2022-04-22T14:17:14Z

@gaurav8297 do you have any benchmarks for this change?

It's there in the last commit. Pasting it here too. cc @findepi @wendigo

Before:
Benchmark                            (stage)  Mode  Cnt      Score      Error  Units
BenchmarkPlanner.planLargeInQuery  optimized  avgt   20  21353.002 ± 1299.409  ms/op

After:
Benchmark                            (stage)  Mode  Cnt      Score     Error  Units
BenchmarkPlanner.planLargeInQuery  optimized  avgt   20  16495.453 ± 260.591  ms/op

findepi

"Introduce PlanOptimizer context"

core/trino-main/src/main/java/io/trino/sql/planner/LogicalPlanner.java

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/HashGenerationOptimizer.java

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/IndexJoinOptimizer.java

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/LimitPushDown.java

core/trino-main/src/test/java/io/trino/sql/planner/optimizations/TestBeginTableWrite.java

core/trino-main/src/test/java/io/trino/sql/planner/assertions/BasePlanTest.java

findepi

"Use same ExpressionInterpreter instance per query"

core/trino-main/src/main/java/io/trino/sql/planner/EffectivePredicateExtractor.java

findepi · 2022-04-22T14:30:09Z

core/trino-main/src/main/java/io/trino/cost/FilterStatsCalculator.java

should an instance get reused here?

core/trino-main/src/main/java/io/trino/sql/planner/LogicalPlanner.java

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

sopel39 · 2022-04-22T14:29:53Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

it shouldn't matter if we optimize or not, should it?

I think it matters. In some places, we are returning different results in case of optimize. For instance, in case of function calls, we don't optimize if the function isn't deterministic.

https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java#L1057

Agreed, the class has two 'modes' and we cannot uniformly cache for both of them at the same time, unless 'the mode' becomes a cache key too

sopel39 · 2022-04-22T14:30:40Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

Can same expression (by identity) be evaluated to something else with different SymbolResolver? That would seem unlikely, no? What would be the case for it?

Can same expression (by identity) be evaluated to something else with different SymbolResolver?

Yes, it is possible. There are some tests which were failing before that's why I did this change. For ex: io.trino.sql.planner.AbstractPredicatePushdownTest#testNormalizeOuterJoinToInner

So. specifically for this test, the SymbolReference is called twice for optimization one without context and another one with the context where the symbol tends to be null.

Another thing that I want to point out is that WeakHashmap isn't based on identity, although GC happens on Identity but the get by key works on equals and hashing.

Another thing that I want to point out is that WeakHashmap isn't based on identity, although GC happens on Identity but the get by key works on equals and hashing.

Good point. I was blindsided by com.google.common.cache.CacheBuilder#weakKeys implying the keys are compared with ==. Indeed, WeakHashMap is not applicable here because Expression.equals is not a good semantics for the expression cache.

In that case, I'm thinking to implement it using IdentityHashMap because anyways ExpressionInterpreter is created per query planning. So, it will get GCed, hence there shouldn't be any memory leak right?

planning can create new Expressions and old, unused expressions should be eligible for GC

So there are two option IMO

Create a new map -> WeakIdentityHashMap which doesn't exist in JDK.

Use com.google.common.cache.CacheBuilder#weakKeys but it has extra cost due to being concurrent.

2. Use com.google.common.cache.CacheBuilder#weakKeys but it has extra cost due to being concurrent.

or com.google.common.collect.MapMaker#weakKeys
(also concurrent map)

1. Create a new map -> WeakIdentityHashMap which doesn't exist in JDK.

Sounds complicated.

@gaurav8297 and I discussed this offline. The conclusion was that we may be able to use equality-based semantics for Expressions. Identity-based comparisons are a must for analyzer and planner, but they may be unnecessary during query optimization.

@martint please chime in here.

@findepi I ran benchmarks with this com.google.common.collect.MapMaker#weakKeys again. The overhead is pretty negligible. So maybe it's better to simply use this?

sopel39 · 2022-04-22T14:34:52Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

This is because intermediate object could get GCed even when expression still exists which will cause cache miss
Which intermediate object? You return intermediate object from processWithExceptionHandling so it shouldn't be GCed.

I mean if we create an intermediate object which contains context and expression as a key for WeakHashMap to avoid this check context instanceof NoOpSymbolResolver. In that case, it can get GCed right?

For ex: #12016 (comment)

findepi

"Use same ExpressionInterpreter instance per query"

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

findepi

"Cache optimized expressions in ExpressionInterpreter"

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

gaurav8297 · 2022-04-27T13:36:59Z

@sopel39 @raunaqmorarka @findepi PTAL again

Currently ExpressionInterpreter.optimize is called multiple time for the same expression from different rules and optimizers. This change will enable us to do expression level caching in ExpressionInterpreter per single query. For instance, in case of long In List, caching will help save a lot of time.

Do not create a new instance of InPredicate expression if it would be same as original expression. Creating a new instance of InPredicate would cause inListCache cache miss, which is using node reference as a cache key.

The avoid the memory leak that could happen because now only one instance of ExpressionInterpreter is used per query. We need to make likePatternCache and inListCache weak referenced.

Benchmarks for 100000 InList values: Before: Benchmark (stage) Mode Cnt Score Error Units BenchmarkPlanner.planLargeInQuery optimized avgt 20 17917.381 ± 479.014 ms/op After: Benchmark (stage) Mode Cnt Score Error Units BenchmarkPlanner.planLargeInQuery optimized avgt 20 14549.675 ± 265.569 ms/op

dain · 2022-05-31T23:25:56Z

This needs a PR description that explains the full scope of the changes. For example, it appears that the first commit is introducing a new planner context class, which is not obvious in name. Please, fill out the template and explain the change.

gaurav8297 · 2022-06-01T09:25:13Z

This needs a PR description that explains the full scope of the changes. For example, it appears that the first commit is introducing a new planner context class, which is not obvious in name. Please, fill out the template and explain the change.

Thanks for pointing out @dain. I've updated the description.

sopel39 · 2022-06-01T12:41:10Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/PredicatePushDown.java

-    public PlanNode optimize(PlanNode plan, Session session, TypeProvider types, SymbolAllocator symbolAllocator, PlanNodeIdAllocator idAllocator, WarningCollector warningCollector)
+    public PlanNode optimize(PlanNode plan, Context context)
    {
        requireNonNull(plan, "plan is null");


check that context is not null, requireNotNull(context, "context is null")
Here and in other similar rules

sopel39 · 2022-06-01T12:42:03Z

core/trino-main/src/test/java/io/trino/sql/planner/assertions/BasePlanTest.java

        }
    }
+
+    protected PlanOptimizer.Context createOptimizerContext(


sopel39 · 2022-06-01T12:46:12Z

core/trino-main/src/main/java/io/trino/cost/FilterStatsCalculator.java

        Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes(plannerContext, session, predicate, types);
-        ExpressionInterpreter interpreter = new ExpressionInterpreter(predicate, plannerContext, session, expressionTypes);
-        Object value = interpreter.optimize(NoOpSymbolResolver.INSTANCE);
+        // TODO - Use the same instance of ExpressionInterpreter create per planning once StatsRule has context


create a github issue for this and reference it here and other TODOs

sopel39 · 2022-06-01T13:08:14Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/WindowFilterPushDown.java

+                    session,
+                    node.getPredicate(),
+                    types,
+                    expressionInterpreter).getTupleDomain();


nit: put getTupleDomain in newline

sopel39 · 2022-06-01T13:09:19Z

core/trino-main/src/test/java/io/trino/sql/planner/TestEffectivePredicateExtractor.java

                Optional.empty());

-        Expression effectivePredicate = effectivePredicateExtractor.extract(SESSION, node, TypeProvider.empty(), typeAnalyzer);
+        Expression effectivePredicate = effectivePredicateExtractor.extract(


maybe it's possible to extract some small utility method

sopel39 · 2022-06-01T13:09:34Z

core/trino-main/src/test/java/io/trino/sql/planner/TestLiteralEncoder.java

    {
-        return new ExpressionInterpreter(expression, PLANNER_CONTEXT, TEST_SESSION, getExpressionTypes(expression)).evaluate();
+        return new ExpressionInterpreter(PLANNER_CONTEXT, TEST_SESSION)
+                .evaluate(expression, getExpressionTypes(expression));


inline with previous line

sopel39 · 2022-06-01T14:02:36Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

+        {
+            if (optimize && resolver instanceof NoOpSymbolResolver) {
+                // We are using weak reference map as cache, that's why we can't depend on an intermediate object
+                // that consists of expression as well as symbolResolver as a key. This is because intermediate object could


I would just mention that SymbolResolver might resolve different value, so we cannot cache value with some arbitrary SymbolResolver.

sopel39 · 2022-06-01T14:04:17Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java


+        private Object processWithCaching(Expression expression, SymbolResolver resolver)
+        {
+            if (optimize && resolver instanceof NoOpSymbolResolver) {


Could we make it work for optimize == false? Maybe have a separate cache that just doesn't cache anything in case of exception. Or maybe simply a single cache that doesn't cache anything if there is exception

// Certain operations like 0 / 0 or likeExpression may throw exceptions. // When optimizing, do not throw the exception, but delay it until the expression is actually executed. // This is to take advantage of the possibility that some other optimization removes the erroneous // expression from the plan.

@martint is it even legal to remove 0/0 if the whole expression gets simplified?

When no exception happens, value should be cached for both optimized = false and true

It's not legal to remove during expression simplification unless it can be proven that that subexpression will never be evaluated (e.g., a never reached branch of a CASE).

Currently, we keep those expressions in the tree to be evaluated at runtime. In some cases, they may be no-ops due to actual data encountered when processing the query, but that's not something that can be usually be determined ahead of time during analysis and planning.

sopel39 · 2022-06-01T14:19:32Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

+            }
+            return process(expression, resolver);
+        }
+


nit:

This is sill missing improvement in io.trino.sql.planner.ExpressionInterpreter.Visitor#visitInPredicate

Within method there is for (Expression expression : valueList.getValues()) { loop which iterates over all elements. One of it's purposes is to optimize IN list terms.

You could cache optimization result for valueList so these elements are not optimized again. Otherwise if symbol is changed for expression like:

x IN (CAST('a' as VARCHAR(42), ...);

to

y IN (CAST('a' as VARCHAR(42), ...);

then with current code IN list will get reevaluated.

martint · 2022-06-01T18:02:15Z

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java

    }

-    public Type getType()
+    public Object evaluate(Expression expression, Map<NodeRef<Expression>, Type> expressionTypes)


The current design of constructing the interpreter with an expression + types is intentional. The expression is a sort of "static" program and making it constant for the interpreter allows for optimizations that span across multiple evaluations. By making it a dynamic input to the interpreter, we're giving up on that, so this is a significant change.

The expression is a sort of "static" program and making it constant for the interpreter allows for optimizations that span across multiple evaluations

Why would you evaluate (optimize) same expression multiple times? It doesn't happen in practice in codebase. However, we can optimize evaluation by caching results of evaluating different expressions.

You can optimize the expression once (resolve operators and functions and cache them, constant fold, etc), and apply them to multiple values, for example for partition pruning.

resolve operators and functions and cache them, constant fold, etc

This is possible (it's not implemented yet) with evaluate(Expression expression, Map<NodeRef<Expression>, Type> expressionTypes) too. Probably even more so if ExpressionEvaluator is used between rules.

My point is that ExpressionEvaluator is much more used to optimize same expression over and over again (with same context) by multiple rules rather than used to evaluate value of an expression with different inputs (e.g. partition pruning).

So IMO we have few options:

keep new interface as optimizations mentioned by you are also possible with evaluate(Expression expression, Map<NodeRef<Expression>, Type> expressionTypes). ExpressionEvaluator perf between rules is also greatly improved

move caching context outside of ExpressionEvaluator somehow so that it can be reused between rules

Have another interface, e.g. CompiledExpression when there would be a need for it.

My preference is 1)

ping @martint

cla-bot bot added the cla-signed label Apr 19, 2022

sopel39 reviewed Apr 19, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java Outdated Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java Outdated Show resolved Hide resolved

findepi requested review from findepi, kasiafi and martint April 19, 2022 14:44

sopel39 reviewed Apr 19, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java Outdated Show resolved Hide resolved

gaurav8297 requested a review from sopel39 April 20, 2022 08:24

findepi reviewed Apr 20, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java Outdated Show resolved Hide resolved

sopel39 reviewed Apr 20, 2022

View reviewed changes

gaurav8297 requested a review from sopel39 April 21, 2022 08:41

gaurav8297 requested a review from raunaqmorarka April 21, 2022 08:46

gaurav8297 marked this pull request as ready for review April 21, 2022 08:56

findepi added the performance label Apr 22, 2022

findepi reviewed Apr 22, 2022

View reviewed changes

sopel39 reviewed Apr 22, 2022

View reviewed changes

findepi reviewed Apr 22, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java Outdated Show resolved Hide resolved

findepi reviewed Apr 22, 2022

View reviewed changes

Introduce PlanOptimizer context

5ed34e0

gaurav8297 and others added 3 commits April 27, 2022 16:44

Use SymbolResolver as context in ExpressionInterpreter.Visitor

f5dbb30

Do not create a new instance of InPredicate if it's not simplified

72f0825

Do not create a new instance of InPredicate expression if it would be same as original expression. Creating a new instance of InPredicate would cause inListCache cache miss, which is using node reference as a cache key.

gaurav8297 added 2 commits April 27, 2022 16:44

Use weak reference map for like pattern and in list cache

b1975bc

The avoid the memory leak that could happen because now only one instance of ExpressionInterpreter is used per query. We need to make likePatternCache and inListCache weak referenced.

gaurav8297 requested a review from findepi May 6, 2022 09:38

sopel39 reviewed Jun 1, 2022

View reviewed changes

martint reviewed Jun 1, 2022

View reviewed changes

gaurav8297 mentioned this pull request Jul 19, 2022

Add table stats cache #13047

Merged

gaurav8297 mentioned this pull request Sep 28, 2022

Local scale writers for partitioned data #14140

Closed

gaurav8297 closed this by deleting the head repository Mar 5, 2023

Conversation

gaurav8297 commented Apr 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

Uh oh!

Uh oh!

findepi commented Apr 19, 2022

Uh oh!

gaurav8297 commented Apr 19, 2022

Uh oh!

Uh oh!

gaurav8297 commented Apr 20, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaurav8297 commented Apr 21, 2022

Uh oh!

gaurav8297 commented Apr 22, 2022

Uh oh!

wendigo commented Apr 22, 2022

Uh oh!

gaurav8297 commented Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

gaurav8297 commented Apr 19, 2022 •

edited

Loading

gaurav8297 commented Apr 22, 2022 •

edited

Loading

sopel39 Apr 22, 2022 •

edited

Loading