Skip to content

Cache ExpressionInterpreter optimizations#12016

Closed
gaurav8297 wants to merge 6 commits intotrinodb:masterfrom
gaurav8297:guarav8297/expression_optimization_cache
Closed

Cache ExpressionInterpreter optimizations#12016
gaurav8297 wants to merge 6 commits intotrinodb:masterfrom
gaurav8297:guarav8297/expression_optimization_cache

Conversation

@gaurav8297
Copy link
Copy Markdown
Member

@gaurav8297 gaurav8297 commented Apr 19, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Problem: ExpressionInterpreter is called multiple times during planning. Hence its optimization results should be cached (especially for long IN lists)

This PR introduces the following things

  • The first two commits introduce a way to utilize the same instance of ExpressionInterpreter for a query. This way query level expression caching is possible without maintaining a global cache. This is based on Karol's comment: Cache ExpressionInterpreter optimizations #12016 (comment)
  • Caching optimization results in ExpressionInterpreter whose instance is unique at query level.

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Apr 19, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 19, 2022

@martint @kasiafi @findepi PTAL

@findepi findepi requested review from findepi, kasiafi and martint April 19, 2022 14:44
@gaurav8297
Copy link
Copy Markdown
Member Author

Just fyi, this pr isn't ready yet

@gaurav8297
Copy link
Copy Markdown
Member Author

@sopel39 PTAL

@gaurav8297 gaurav8297 requested a review from sopel39 April 20, 2022 08:24
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not sure this cache is even used in practice TBH

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. This cache isn't used in planLargeInQuery benchmark.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this cache only gets used when the InPredicate value isn't an expression which seems like a very rare use case. Please correct me if I'm wrong

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so and seems to have always been so since 5c8b118.
Maybe we should remove the io.trino.sql.planner.ExpressionInterpreter#inListCache? cc @martint

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should remove the io.trino.sql.planner.ExpressionInterpreter#inListCache?

Yes, I think we should do that

@gaurav8297 gaurav8297 requested a review from sopel39 April 21, 2022 08:41
@gaurav8297
Copy link
Copy Markdown
Member Author

@sopel39 @raunaqmorarka PTAL

@gaurav8297 gaurav8297 marked this pull request as ready for review April 21, 2022 08:56
@gaurav8297
Copy link
Copy Markdown
Member Author

Looking at the failing tests

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Apr 22, 2022

@gaurav8297 do you have any benchmarks for this change?

@gaurav8297
Copy link
Copy Markdown
Member Author

gaurav8297 commented Apr 22, 2022

@gaurav8297 do you have any benchmarks for this change?

It's there in the last commit. Pasting it here too. cc @findepi @wendigo

Before:
Benchmark                            (stage)  Mode  Cnt      Score      Error  Units
BenchmarkPlanner.planLargeInQuery  optimized  avgt   20  21353.002 ± 1299.409  ms/op

After:
Benchmark                            (stage)  Mode  Cnt      Score     Error  Units
BenchmarkPlanner.planLargeInQuery  optimized  avgt   20  16495.453 ± 260.591  ms/op

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Introduce PlanOptimizer context"

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use same ExpressionInterpreter instance per query"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should an instance get reused here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't matter if we optimize or not, should it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it matters. In some places, we are returning different results in case of optimize. For instance, in case of function calls, we don't optimize if the function isn't deterministic.

https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/sql/planner/ExpressionInterpreter.java#L1057

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the class has two 'modes' and we cannot uniformly cache for both of them at the same time, unless 'the mode' becomes a cache key too

Copy link
Copy Markdown
Member

@sopel39 sopel39 Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can same expression (by identity) be evaluated to something else with different SymbolResolver? That would seem unlikely, no? What would be the case for it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can same expression (by identity) be evaluated to something else with different SymbolResolver?

Yes, it is possible. There are some tests which were failing before that's why I did this change. For ex: io.trino.sql.planner.AbstractPredicatePushdownTest#testNormalizeOuterJoinToInner

So. specifically for this test, the SymbolReference is called twice for optimization one without context and another one with the context where the symbol tends to be null.

Another thing that I want to point out is that WeakHashmap isn't based on identity, although GC happens on Identity but the get by key works on equals and hashing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that I want to point out is that WeakHashmap isn't based on identity, although GC happens on Identity but the get by key works on equals and hashing.

Good point. I was blindsided by com.google.common.cache.CacheBuilder#weakKeys implying the keys are compared with ==. Indeed, WeakHashMap is not applicable here because Expression.equals is not a good semantics for the expression cache.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'm thinking to implement it using IdentityHashMap because anyways ExpressionInterpreter is created per query planning. So, it will get GCed, hence there shouldn't be any memory leak right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planning can create new Expressions and old, unused expressions should be eligible for GC

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two option IMO

  1. Create a new map -> WeakIdentityHashMap which doesn't exist in JDK.
  2. Use com.google.common.cache.CacheBuilder#weakKeys but it has extra cost due to being concurrent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. Use com.google.common.cache.CacheBuilder#weakKeys but it has extra cost due to being concurrent.

or com.google.common.collect.MapMaker#weakKeys
(also concurrent map)

1. Create a new map -> WeakIdentityHashMap which doesn't exist in JDK.

Sounds complicated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaurav8297 and I discussed this offline. The conclusion was that we may be able to use equality-based semantics for Expressions. Identity-based comparisons are a must for analyzer and planner, but they may be unnecessary during query optimization.

@martint please chime in here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi I ran benchmarks with this com.google.common.collect.MapMaker#weakKeys again. The overhead is pretty negligible. So maybe it's better to simply use this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because intermediate object could get GCed even when expression still exists which will cause cache miss
Which intermediate object? You return intermediate object from processWithExceptionHandling so it shouldn't be GCed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if we create an intermediate object which contains context and expression as a key for WeakHashMap to avoid this check context instanceof NoOpSymbolResolver. In that case, it can get GCed right?

For ex: #12016 (comment)

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use same ExpressionInterpreter instance per query"

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cache optimized expressions in ExpressionInterpreter"

@gaurav8297
Copy link
Copy Markdown
Member Author

@sopel39 @raunaqmorarka @findepi PTAL again

gaurav8297 and others added 3 commits April 27, 2022 16:44
Currently ExpressionInterpreter.optimize is called
multiple time for the same expression from different
rules and optimizers. This change will enable us to
do expression level caching in ExpressionInterpreter
per single query. For instance, in case of long In
List, caching will help save a lot of time.
Do not create a new instance of InPredicate expression
if it would be same as original expression. Creating
a new instance of InPredicate would cause inListCache
cache miss, which is using node reference as a cache key.
The avoid the memory leak that could happen because now
only one instance of ExpressionInterpreter is used per
query. We need to make likePatternCache and inListCache
weak referenced.
Benchmarks for 100000 InList values:

Before:
Benchmark                            (stage)  Mode  Cnt      Score     Error  Units
BenchmarkPlanner.planLargeInQuery  optimized  avgt   20  17917.381 ± 479.014  ms/op

After:
Benchmark                            (stage)  Mode  Cnt      Score     Error  Units
BenchmarkPlanner.planLargeInQuery  optimized  avgt   20  14549.675 ± 265.569  ms/op
@gaurav8297 gaurav8297 requested a review from findepi May 6, 2022 09:38
@dain
Copy link
Copy Markdown
Member

dain commented May 31, 2022

This needs a PR description that explains the full scope of the changes. For example, it appears that the first commit is introducing a new planner context class, which is not obvious in name. Please, fill out the template and explain the change.

@gaurav8297
Copy link
Copy Markdown
Member Author

This needs a PR description that explains the full scope of the changes. For example, it appears that the first commit is introducing a new planner context class, which is not obvious in name. Please, fill out the template and explain the change.

Thanks for pointing out @dain. I've updated the description.

public PlanNode optimize(PlanNode plan, Session session, TypeProvider types, SymbolAllocator symbolAllocator, PlanNodeIdAllocator idAllocator, WarningCollector warningCollector)
public PlanNode optimize(PlanNode plan, Context context)
{
requireNonNull(plan, "plan is null");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that context is not null, requireNotNull(context, "context is null")
Here and in other similar rules

}
}

protected PlanOptimizer.Context createOptimizerContext(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static?

Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes(plannerContext, session, predicate, types);
ExpressionInterpreter interpreter = new ExpressionInterpreter(predicate, plannerContext, session, expressionTypes);
Object value = interpreter.optimize(NoOpSymbolResolver.INSTANCE);
// TODO - Use the same instance of ExpressionInterpreter create per planning once StatsRule has context
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a github issue for this and reference it here and other TODOs

session,
node.getPredicate(),
types,
expressionInterpreter).getTupleDomain();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put getTupleDomain in newline

Optional.empty());

Expression effectivePredicate = effectivePredicateExtractor.extract(SESSION, node, TypeProvider.empty(), typeAnalyzer);
Expression effectivePredicate = effectivePredicateExtractor.extract(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's possible to extract some small utility method

{
return new ExpressionInterpreter(expression, PLANNER_CONTEXT, TEST_SESSION, getExpressionTypes(expression)).evaluate();
return new ExpressionInterpreter(PLANNER_CONTEXT, TEST_SESSION)
.evaluate(expression, getExpressionTypes(expression));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline with previous line

{
if (optimize && resolver instanceof NoOpSymbolResolver) {
// We are using weak reference map as cache, that's why we can't depend on an intermediate object
// that consists of expression as well as symbolResolver as a key. This is because intermediate object could
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just mention that SymbolResolver might resolve different value, so we cannot cache value with some arbitrary SymbolResolver.


private Object processWithCaching(Expression expression, SymbolResolver resolver)
{
if (optimize && resolver instanceof NoOpSymbolResolver) {
Copy link
Copy Markdown
Member

@sopel39 sopel39 Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it work for optimize == false? Maybe have a separate cache that just doesn't cache anything in case of exception. Or maybe simply a single cache that doesn't cache anything if there is exception

                    // Certain operations like 0 / 0 or likeExpression may throw exceptions.
                    // When optimizing, do not throw the exception, but delay it until the expression is actually executed.
                    // This is to take advantage of the possibility that some other optimization removes the erroneous
                    // expression from the plan.

@martint is it even legal to remove 0/0 if the whole expression gets simplified?

Copy link
Copy Markdown
Member

@sopel39 sopel39 Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no exception happens, value should be cached for both optimized = false and true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not legal to remove during expression simplification unless it can be proven that that subexpression will never be evaluated (e.g., a never reached branch of a CASE).

Currently, we keep those expressions in the tree to be evaluated at runtime. In some cases, they may be no-ops due to actual data encountered when processing the query, but that's not something that can be usually be determined ahead of time during analysis and planning.

}
return process(expression, resolver);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

This is sill missing improvement in io.trino.sql.planner.ExpressionInterpreter.Visitor#visitInPredicate

Within method there is for (Expression expression : valueList.getValues()) { loop which iterates over all elements. One of it's purposes is to optimize IN list terms.

You could cache optimization result for valueList so these elements are not optimized again. Otherwise if symbol is changed for expression like:

x IN (CAST('a' as VARCHAR(42), ...);

to

y IN (CAST('a' as VARCHAR(42), ...);

then with current code IN list will get reevaluated.

}

public Type getType()
public Object evaluate(Expression expression, Map<NodeRef<Expression>, Type> expressionTypes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design of constructing the interpreter with an expression + types is intentional. The expression is a sort of "static" program and making it constant for the interpreter allows for optimizations that span across multiple evaluations. By making it a dynamic input to the interpreter, we're giving up on that, so this is a significant change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression is a sort of "static" program and making it constant for the interpreter allows for optimizations that span across multiple evaluations

Why would you evaluate (optimize) same expression multiple times? It doesn't happen in practice in codebase. However, we can optimize evaluation by caching results of evaluating different expressions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can optimize the expression once (resolve operators and functions and cache them, constant fold, etc), and apply them to multiple values, for example for partition pruning.

Copy link
Copy Markdown
Member

@sopel39 sopel39 Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve operators and functions and cache them, constant fold, etc

This is possible (it's not implemented yet) with evaluate(Expression expression, Map<NodeRef<Expression>, Type> expressionTypes) too. Probably even more so if ExpressionEvaluator is used between rules.

My point is that ExpressionEvaluator is much more used to optimize same expression over and over again (with same context) by multiple rules rather than used to evaluate value of an expression with different inputs (e.g. partition pruning).

So IMO we have few options:

  1. keep new interface as optimizations mentioned by you are also possible with evaluate(Expression expression, Map<NodeRef<Expression>, Type> expressionTypes). ExpressionEvaluator perf between rules is also greatly improved
  2. move caching context outside of ExpressionEvaluator somehow so that it can be reused between rules
  3. Have another interface, e.g. CompiledExpression when there would be a need for it.

My preference is 1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @martint

@gaurav8297 gaurav8297 closed this by deleting the head repository Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants