Skip to content

Support any expression for dynamic filtering probe side#17169

Merged
zhenxiao merged 2 commits intoprestodb:masterfrom
zhenxiao:dynamic-expression
Jan 22, 2022
Merged

Support any expression for dynamic filtering probe side#17169
zhenxiao merged 2 commits intoprestodb:masterfrom
zhenxiao:dynamic-expression

Conversation

@zhenxiao
Copy link
Copy Markdown
Collaborator

@zhenxiao zhenxiao commented Jan 10, 2022

Test plan -
TestDynamicFilter
TestHiveDistributedJoinQueriesWithDynamicFilteringAndFilterPushdown.java

== RELEASE NOTES ==

General Changes
* dynamic filtering support for right join
* support any expression for dynamic filtering probe side

@zhenxiao zhenxiao requested a review from rongrong January 10, 2022 16:13
@zhenxiao zhenxiao changed the title Support right join for dynamic filtering and any expression for probe… Support any expression for dynamic filtering probe side Jan 11, 2022
@zhenxiao zhenxiao force-pushed the dynamic-expression branch 4 times, most recently from ec9b51d to 5a03fa5 Compare January 12, 2022 02:47
Copy link
Copy Markdown
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation of this PR? What problem does it solve? If it enables new use cases, please add new tests to reflect that.

Comment on lines 737 to 743
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is not about whether the operator is a comparison operator. What does it do? Please name the function to reflect the logic. Thanks!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, supported operator should be comparison, excluding not_equal and distinct, and the expression needs:

  1. left child contains left variables and right child contains right variables
  2. or left child contains right variables and right child contains left variables

get it updated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be && or ||?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied above

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is repeating the logic in the function. Since the function is only used once, might as well just inline. It would be easier to read that way too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will inline

@zhenxiao
Copy link
Copy Markdown
Collaborator Author

thank you, @rongrong
comments addressed. testcases added

@zhenxiao zhenxiao requested a review from rongrong January 13, 2022 06:21
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like more high level explanation in the PR description. It is using too many special cases and correctness is not obvious. Also it doesn't seem to check for non-deterministic expressions. And if you are doing this for any specific case, please elaborate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not left join? Are you assuming right side is going to be the build?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, dynamic filtering applies to broadcast join, and the right side is the build.
since dynamic filtering is generating predicates and propagate to the probe side, right join is fine, but left join might not be correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are assuming that RIGHT join will never be rewritten to LEFT due to some other optimizations? I'm just wondering if this is useful. Please add a real usecase and also more comprehensive testing (including hive end to end) and also a session param to control this feature (with defualt off) if we are convinced this is a good thing to do.

Copy link
Copy Markdown
Collaborator Author

@zhenxiao zhenxiao Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think RIGHT will be rewritten to LEFT when it comes here, since:

  1. dynamic filtering only applies to broadcast join, we already have session properties to turn it on/off
  2. the code here is a branch assuming dynamic filtering is on
  3. cost based optimizer is not used

SystemSessionProperties.isEnableDynamicFiltering is the session property to control dynamic filtering
The use case is:
SELECT o.orderkey FROM orders o RIGHT JOIN lineitem l ON l.orderkey + 1 = o.orderkey
The motivation to work on this feature it, when I was implementing comparison operator support for dynamic filtering, was trying to start from simple, so assuming left and right to be variable, and only for inner join. Actually, find support right join and expression for probe side is just a few lines changes, so submitted this PR.

@zhenxiao zhenxiao requested a review from kaikalur January 19, 2022 06:25
@zhenxiao
Copy link
Copy Markdown
Collaborator Author

kindly ping

@kaikalur
Copy link
Copy Markdown
Contributor

kindly ping

I would like to see hive end to end tests running with this feature and also specific tests with predicate being effective and not as well. I'm concerned we are adding something that's marginally useful but potentially troublesome in terms of correctness. Outer joins confuse me :)

@zhenxiao
Copy link
Copy Markdown
Collaborator Author

kindly ping

I would like to see hive end to end tests running with this feature and also specific tests with predicate being effective and not as well. I'm concerned we are adding something that's marginally useful but potentially troublesome in terms of correctness. Outer joins confuse me :)

hi @kaikalur end to end tests added in AbstractTestJoinQueries.java , which will be triggered in: TestHiveDistributedJoinQueriesWithDynamicFiltering.java and
TestHiveDistributedJoinQueriesWithDynamicFilteringAndFilterPushdown.java

@zhenxiao zhenxiao force-pushed the dynamic-expression branch 2 times, most recently from 99b42da to 00d3100 Compare January 20, 2022 17:10
@kaikalur
Copy link
Copy Markdown
Contributor

will take a look after tests pass

@zhenxiao zhenxiao force-pushed the dynamic-expression branch 2 times, most recently from fba865f to fde4b1b Compare January 21, 2022 06:22
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just test without explictly setting broadcast join and it doesn't break anything.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is repeating the logic in the function. Since the function is only used once, might as well just inline. It would be easier to read that way too.

Comment on lines 744 to 747
Copy link
Copy Markdown
Contributor

@rongrong rongrong Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variables are not necessary, if you want to keep them rightContainsContainsRightVariables and rightContainsContainsLeftVariables should be rightChild... instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, nice catch. Will fix

Copy link
Copy Markdown
Contributor

@rongrong rongrong Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this test setup, how are these testing the correctness of dynamic filters?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractTestJoinQueries testcases will be triggered in
TestHiveDistributedJoinQueriesWithDynamicFiltering and
TestHiveDistributedJoinQueriesWithDynamicFilteringAndFilterPushdown
and many other tests, to guarantee the correctness

@rongrong
Copy link
Copy Markdown
Contributor

The release note should start with a verb, something like "Add support xxx", I think a single release note capturing both enhancement for dynamic filter would be fine.

Copy link
Copy Markdown
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix nits before merging. Thanks!

@zhenxiao zhenxiao merged commit 2d0e663 into prestodb:master Jan 22, 2022
@zhenxiao zhenxiao deleted the dynamic-expression branch January 22, 2022 15:33
@neeradsomanchi neeradsomanchi mentioned this pull request Feb 8, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants