Skip to content

Transform in values to in filter#23161

Open
jackychen718 wants to merge 1 commit intoprestodb:masterfrom
jackychen718:InValues-to-InFilter
Open

Transform in values to in filter#23161
jackychen718 wants to merge 1 commit intoprestodb:masterfrom
jackychen718:InValues-to-InFilter

Conversation

@jackychen718
Copy link
Contributor

@jackychen718 jackychen718 commented Jul 10, 2024

Description

IN (VALUES ...) should be translated into simple IN LIST

Motivation and Context

The issue comes from #22728 IN (VALUES ..) should be translated as simple IN LIST
If VALUES table only has one column, transformed it into IN LIST

fixes #22728

Impact

Add another optimization rule, TransformInValuesToInFilter:
If filteringSource of SemiJoinNode comes from a ValueNode with only one column, transform it into a ProjectNode with outputVariable of the SemiJoinNode assigned to the IN LIST predicate. The outputVariable will be filtered in a later stage.
This rule has to be used together with another rule, InlineProjectionsOnValues on #23245

Test Plan

Test the rule fires on in the scenario and returns correct query plan.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Add session property ``transform_in_values_to_in_filter`` and configuration property ``optimizer.transform_in_values_to_in_filter`` to transform key IN (VALUES ...) form into IN LIST form when VALUES table only has one column :pr:'23161'.
* Add session property ``inline_projections_on_values`` and configuration property ``optimizer.inline-projections-on-values`` to evaluate project node on values node :pr:`23245`. 

@jackychen718 jackychen718 requested review from a team, feilong-liu and jaystarshot as code owners July 10, 2024 06:02
@jackychen718 jackychen718 requested a review from presto-oss July 10, 2024 06:02
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this resolve #22728?

@jackychen718
Copy link
Contributor Author

Yes. It resolves #22728.

elharo
elharo previously approved these changes Jul 10, 2024
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you say

fixes #22728

in the PR description then Github autocloses the bug when the PR is merged.

@jackychen718
Copy link
Contributor Author

Thanks for your advice.

@steveburnett
Copy link
Contributor

steveburnett commented Jul 10, 2024

(Edit to acknowledge a release note entry has been added - please ignore this comment.)

If this doesn't need a release note - and I don't believe it does - please add this so this PR isn't in the "Missing Release Notes" section of the release notes process (see #23079 for an example).

Release Notes

== NO RELEASE NOTE ==

implements Rule<SemiJoinNode>
{
private static final Pattern<SemiJoinNode> PATTERN = semiJoin().with(filteringSource()
.matching(project().with(source()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that there are multiple projects between values and semi join nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I see that, in previous version, it has more than one projects. However, in current version, it has only one project there.Do you have some ideas how can I match multiple levels to catch the ValuesNode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the pattern but values->project* should still be constant? Maybe we have a separate inline constant values rule that'll be generally useful and this rule. So if you have something like:

VALUES('abcd')->Project(subtr(field_0, 1, 2))

you can rewrite that as:

VALUES('ab')

Using the RowExpressionInterpreter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public Result apply(SemiJoinNode semiJoinNode, Captures captures, Context context)
{
PlanNode source = semiJoinNode.getSource();
return context.getLookup().resolveGroup(semiJoinNode.getFilteringSource()).findFirst()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the whole logic within the single return statement may not be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes it hard to read

Optional.empty());
}

public static Property<SemiJoinNode, PlanNode> filteringSource()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in a statics class "SemiJoin" like other node specific properties below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add an e2e test

public Result apply(SemiJoinNode semiJoinNode, Captures captures, Context context)
{
PlanNode source = semiJoinNode.getSource();
return context.getLookup().resolveGroup(semiJoinNode.getFilteringSource()).findFirst()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes it hard to read

.flatMap(projectNode -> context.getLookup().resolveGroup(projectNode.getSources().get(0)).findFirst())
.map(ValuesNode.class::cast)
.map(ValuesNode::getRows)
// check that all values are only a single row expression (no struct/row types)
Copy link
Contributor

@kaikalur kaikalur Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why not struct/row? How about array types?

Copy link
Contributor Author

@jackychen718 jackychen718 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact the apply function works for struct/row and array. The only problem is how to match the pattern of variable projects between valuesNode and SemiJoinNode.

@steveburnett
Copy link
Contributor

Nit of formatting the release note entry to use ` and not ', and add a space between the session property and the PR number.

== RELEASE NOTES ==

General Changes
* Add an optimization rule that transform key IN (VALUES ...) form into IN LIST form if VALUES table only has one column. The optimization is controlled by session property `transform_in_values_to_in_filter` :pr:`23161`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IN (VALUES ..) should be translated as simple IN LIST

5 participants