Skip to content

Optimize effectively literal values#10663

Merged
findepi merged 10 commits intotrinodb:masterfrom
findepi:findepi/generalized-literal
Jan 27, 2022
Merged

Optimize effectively literal values#10663
findepi merged 10 commits intotrinodb:masterfrom
findepi:findepi/generalized-literal

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Jan 18, 2022

In many code places, we have special paths for expressions that are
literals. Detecting literals with instanceof Literal is not enough, as
some types, or some (value, type) combinations, do not have direct
literal form. The LiteralEncoder encodes them as constant, terse
expressions. Throughout the optimizer, the "is literal" detections
should treat them same way.

Follows discussion under #10499 (comment)

Copy link
Copy Markdown
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@findepi findepi force-pushed the findepi/generalized-literal branch from 514dc39 to 4bbc72b Compare January 18, 2022 16:23
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jan 18, 2022

Thanks @losipiuk for a quick review.

Added more testing (https://github.com/trinodb/trino/compare/514dc390b4e1c1afbe6158b406b838e596c32568..4bbc72b9c2dea585e6c2dee7bc92b6c9a0324171) and this uncovered a small glitch, also fixed.

@findepi findepi force-pushed the findepi/generalized-literal branch from 4bbc72b to eacde51 Compare January 18, 2022 17:22
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jan 18, 2022

@findepi findepi force-pushed the findepi/generalized-literal branch from eacde51 to 7fc353e Compare January 19, 2022 15:40
@findepi findepi force-pushed the findepi/generalized-literal branch from 7fc353e to dc2ed0c Compare January 19, 2022 15:41
@findepi findepi requested a review from martint January 19, 2022 15:41
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jan 19, 2022

AC

@findepi findepi force-pushed the findepi/generalized-literal branch from dc2ed0c to 0bc4936 Compare January 20, 2022 10:28
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here & below. Previously we were projecting CAST(DECIMAL '2.1' AS decimal(2, 1)) column to then construct ST_Point(x,x) from it. Now this got recognized as effectively literal and inlined. Thus ST_Point() function call is replaced with literal (point21x21Literal), and subsequent Project nodes are merged into one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this case handled? Is it SimplifyExpressions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in FilterStatsCalculator,

Expression simplifiedExpression = simplifyExpression(session, predicate, types);
return new FilterExpressionStatsCalculatingVisitor(statsEstimate, session, types)
.process(simplifiedExpression);
}
private Expression simplifyExpression(Session session, Expression predicate, TypeProvider types)
{
// TODO reuse io.trino.sql.planner.iterative.rule.SimplifyExpressions.rewrite
Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes(session, predicate, types);
ExpressionInterpreter interpreter = new ExpressionInterpreter(predicate, plannerContext, session, expressionTypes);
Object value = interpreter.optimize(NoOpSymbolResolver.INSTANCE);
if (value == null) {
// Expression evaluates to SQL null, which in Filter is equivalent to false. This assumes the expression is a top-level expression (eg. not in NOT).
value = false;
}
return new LiteralEncoder(plannerContext).toExpression(session, value, BOOLEAN);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important that the cast doesn't fail?

I guess, we don't want the query to fail in the stats calculator, or in SimplifyCountOverConstant, trying to evaluate the constant value.

However, in PredicatePushDown, we don't evaluate the expression, so we could consider the potentially unsafe expression as a constant. And in LocalExecutionPlanner we shouldn't care if it fails, but take the opportunity to use the shortcut for trivial projections.

Also, Literal is not safe. It is possible to create a GenericLiteral with mismatching value, and that also fails to evaluate.

I suggest a different approach on handling failures: we could skip the validation here, and if the "effectively literal" value is to be evaluated during the optimization phase, then we could use the safe ExpressionInterpreter.optimize method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important that the cast doesn't fail?

the isEffectivelyLiteral is a boolean-returning method intended to be used wherever we inspect whether an expression can be considered "a literal". It's not intended to throw for a valid input, as it would make the usage harder.

Note that we don't need to be 'smart' about failing cast case. In a typical case, ExpressionInterpreter will fold it to a fail call (which isn't recognized as a literal).

Also, Literal is not safe. It is possible to create a GenericLiteral with mismatching value, and that also fails to evaluate.

True. However, no optimizer / optimizer rule should do this. Literals coming from parser should be validated in ExpressionAnalyzer.

I suggest a different approach on handling failures: we could skip the validation here, and if the "effectively literal" value is to be evaluated during the optimization phase, then we could use the safe ExpressionInterpreter.optimize method.

That would work. Note however that my intention was to capture literals, and other things produced by LiteralEncoder. Hence the method name & javadoc and that's also why I choose not to fail.
Do you feel strongly about this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the isEffectivelyLiteral is a boolean-returning method intended to be used wherever we inspect whether an expression can be considered "a literal". It's not intended to throw for a valid input, as it would make the usage harder.

My question was: why do we try to filter-out failing casts instead of just reporting them as "effectively literals". I wasn't trying to suggest that isEffectivelyLiteral should throw. Sorry for not being clear.

Note that we don't need to be 'smart' about failing cast case. In a typical case, ExpressionInterpreter will fold it to a fail call (which isn't recognized as a literal).

Yes, but we shouldn't depend on the rule order, especially that isEffectivelyLiteral can be reused in the future.

that my intention was to capture literals, and other things produced by LiteralEncoder.

If we drop the failure check in isEffectivelyLiteral, which is my suggestion, that intention is still satisfied. It would then depend on the caller whether they care about errors (and if they do, to use ExpressionInterpreter.optimize).
Generally, I think that using ExpressionInterpreter.optimize instead of ExpressionInterpreter.evaluate should be the rule anywhere before the execution.

If we drop the failure check in isEffectivelyLiteral, we are also consistent wrt failing casts and failing GenericLiterals (until we fix them).

On the other hand, if isEffectivelyLiteral reports failing cast as a non-literal, we miss out in PPD and LocalExecutionPlanner, as I mentioned before.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering LocalExecutionPlanner, are there tests for the case with Cast(Literal)) being handled as trivial projection? It was not possible before.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, the intended contract of this method is to filter simple constant expressions which do not fail, so that the caller does not need to deal with potential failure.

I consider this a fair decision, as the majority of failing expressions should never reach this point (supposed that they go through the ExpressionInterpreter first). So, we don't lose much optimization opportunities by filtering out failing expressions, while the usage of the method is considerably simplified.

However, for the contract to hold, we need to validate Literals also, not only the casts. Maybe add the validation for Literals for now, and a TODO that user-provided Literals should ba validated in the ExpressionAnalyzer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, for the contract to hold, we need to validate Literals also, not only the casts.

Done in #10720, so let me skip adding it here.

@findepi findepi force-pushed the findepi/generalized-literal branch from 0bc4936 to 1de9a35 Compare January 20, 2022 13:54
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jan 20, 2022

Pushed some clarification checks in io.trino.sql.planner.ExpressionInterpreter.Visitor#processOperands, as i spent some time thinking why the recursive coalesce processing is actually awesome.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, the intended contract of this method is to filter simple constant expressions which do not fail, so that the caller does not need to deal with potential failure.

I consider this a fair decision, as the majority of failing expressions should never reach this point (supposed that they go through the ExpressionInterpreter first). So, we don't lose much optimization opportunities by filtering out failing expressions, while the usage of the method is considerably simplified.

However, for the contract to hold, we need to validate Literals also, not only the casts. Maybe add the validation for Literals for now, and a TODO that user-provided Literals should ba validated in the ExpressionAnalyzer?

@findepi findepi force-pushed the findepi/generalized-literal branch from 1de9a35 to 3eec633 Compare January 21, 2022 13:01
@findepi findepi force-pushed the findepi/generalized-literal branch from 3eec633 to 43e2101 Compare January 21, 2022 13:06
@findepi findepi requested a review from kasiafi January 21, 2022 13:10
Copy link
Copy Markdown
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 435 to 436
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After validating expressions of the form Cast(Literal) in the isEffectivelyLiteral() method, and expressions being Literals as in #10720, we can skip the validation here, and just use interpreter.evaluate().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to evaluateConstantExpression, it calls ExpressionInterpreter.evaluate` behind the scenes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With isEffectivelyLiteral, this is not quite true any more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this: #10663 (comment) ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this: #10663 (comment) ?

Added tests in TestLogicalPlanner and TestInlineProjections.

With isEffectivelyLiteral, this is not quite true any more.

it's still true. ExpressionInterpreter.optimize() is not cheap in general
while ExpressionInterpreter.optimize() is used internally in isEffectivelyLiteral, the use is guarded with a check ensuring the call would be cheap.

In many code places, we have special paths for expressions that are
literals. Detecting literals with `instanceof Literal` is not enough, as
some types, or some (value, type) combinations, do not have direct
literal form. The `LiteralEncoder` encodes them as constant, terse
expressions. Throughout the optimizer, the "is literal" detections
should treat them same way.
@findepi findepi force-pushed the findepi/generalized-literal branch from 43e2101 to 71045b5 Compare January 27, 2022 10:10
@findepi findepi merged commit ab6072d into trinodb:master Jan 27, 2022
@findepi findepi deleted the findepi/generalized-literal branch January 27, 2022 13:02
@github-actions github-actions bot added this to the 370 milestone Jan 27, 2022
This was referenced Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants