Skip to content

Load lazy blocks before using them in page filter/projection#10322

Merged
martint merged 1 commit intotrinodb:masterfrom
raunaqmorarka:load-block
Feb 2, 2022
Merged

Load lazy blocks before using them in page filter/projection#10322
martint merged 1 commit intotrinodb:masterfrom
raunaqmorarka:load-block

Conversation

@raunaqmorarka
Copy link
Copy Markdown
Member

Helps to avoid calls to LazyData#getTopLevelBlock in
generated page filter and projection methods.
PageFieldsToInputParametersRewriter now also records which
channels are evaluated unconditionally so that LazyBlock can
be loaded for those channel before expression evaluation.

@cla-bot cla-bot bot added the cla-signed label Dec 16, 2021
@raunaqmorarka raunaqmorarka force-pushed the load-block branch 2 times, most recently from 8d4fd53 to 331e845 Compare December 16, 2021 13:31
Copy link
Copy Markdown
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems way beyond my Trino capabilities.
At least I found a missing requireNonNull, which makes me a proper reviewer.
I'll try to have another look tomorrow but you should seek feedback from professionals.

@raunaqmorarka raunaqmorarka force-pushed the load-block branch 4 times, most recently from a293b31 to b9b690b Compare December 22, 2021 09:27
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@martint do you want to peek?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really depends on the function being applied. It's possible that not all of the references in the lambda expression is used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I've updated visitCall to now assume that all the inputs to lambda expression can be conditionally evaluated (i.e. keep existing behaviour of all blocks being lazily loaded).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really depends on the function being applied. It's possible that not all of the references in the lambda expression is used.

@martint could you put an example here?

Should we just assume that lambda body is unconditionallyEvaluated=false and add a TODO?

@raunaqmorarka raunaqmorarka force-pushed the load-block branch 4 times, most recently from ba8e97a to a7d36fe Compare December 23, 2021 06:05
@raunaqmorarka raunaqmorarka force-pushed the load-block branch 2 times, most recently from 8f2c5d8 to 8c2990c Compare December 23, 2021 13:19
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why call expression has special handling of lambdas? Shouldn't lambdas be fully covered via visitLambda?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the visitor is traversing lambda expression, the leaf nodes always seem to be ConstantExpression or VariableReferenceExpression. So setting any value for unconditionallyEvaluated in visitLambda seems to be a no-op.
I think lambda expression is supposed to be always part of a function like transform, reduce etc. The InputReferenceExpression for the fields potentially accessed by the lambda are found in the initial arguments of CallExpression.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let this one for @martint to look at.
Could you search for a code that creates CallExpression for lambdas and see what are the arguments? According to antrl lambdas are primary expressions, so I'm not sure where CallExpression comes from

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the leaf nodes always seem to be ConstantExpression or VariableReferenceExpression. So setting any value for unconditionallyEvaluated in visitLambda seems to be a no-op.

You would need to link those references to the corresponding arguments of the call to the higher-order function.

For example, in a hypothetical apply2 function: apply2(a, b, (x, y) -> x AND y), b is conditionally loaded, but a isn't due to the conditional nature of x AND y.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint in the current code my intent was to fallback to existing behaviour for lambda expression. The motivating scenario for this PR was to improve efficiency for simple filters and projections (e.g. tpch/q01). Can we skip optimising this scenario or would you still consider this a blocker ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can skip it for now. But add a TODO so we know the implementation is not yet complete.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO here now

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really depends on the function being applied. It's possible that not all of the references in the lambda expression is used.

@martint could you put an example here?

Should we just assume that lambda body is unconditionallyEvaluated=false and add a TODO?

Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments. Waiting for @martint input

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let this one for @martint to look at.
Could you search for a code that creates CallExpression for lambdas and see what are the arguments? According to antrl lambdas are primary expressions, so I'm not sure where CallExpression comes from

@raunaqmorarka raunaqmorarka force-pushed the load-block branch 2 times, most recently from 00e0ce3 to de7d367 Compare January 5, 2022 12:45
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % @martint review

@raunaqmorarka raunaqmorarka force-pushed the load-block branch 2 times, most recently from 837b10d to 94a2c01 Compare January 19, 2022 13:36
@raunaqmorarka raunaqmorarka requested a review from sopel39 January 19, 2022 13:36
Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint can you look?

@martint
Copy link
Copy Markdown
Member

martint commented Jan 20, 2022

I'll take a look in the next couple of days.

Helps to avoid calls to LazyData#getTopLevelBlock in
generated page filter and projection methods.
PageFieldsToInputParametersRewriter now also records which
channels are evaluated unconditionally so that LazyBlock can
be loaded for those channels before expression evaluation.
@raunaqmorarka
Copy link
Copy Markdown
Member Author

Load lazy blocks TPCH unpartitioned ORC sf1000.pdf

@martint I've attached results from TPCH unpartitioned ORC sf1000 run

@martint martint merged commit 97b33da into trinodb:master Feb 2, 2022
@raunaqmorarka raunaqmorarka deleted the load-block branch February 2, 2022 19:42
@github-actions github-actions bot added this to the 370 milestone Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants