Skip to content

[Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested loop joins#23315

Closed
aditi-pandit wants to merge 1 commit intomasterfrom
skip-stream-agg
Closed

[Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested loop joins#23315
aditi-pandit wants to merge 1 commit intomasterfrom
skip-stream-agg

Conversation

@aditi-pandit
Copy link
Copy Markdown
Contributor

Description

Nested joins do not maintain order of rows in Native execution. So do not carry the properties forward in StreamPropertyDerivations class.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@aditi-pandit aditi-pandit requested a review from presto-oss July 26, 2024 22:39
@aditi-pandit aditi-pandit marked this pull request as draft July 26, 2024 22:39
@aditi-pandit aditi-pandit changed the title [Native] Do not carry over partitioning in StreamPropertyDerivations for nested joins [Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested joins Jul 26, 2024
@aditi-pandit aditi-pandit force-pushed the skip-stream-agg branch 3 times, most recently from c4f53c3 to a2d90e6 Compare July 26, 2024 23:11
@aditi-pandit
Copy link
Copy Markdown
Contributor Author

@karteekmurthys

.translate(column -> PropertyDerivations.filterIfMissing(outputs, column))
.unordered(unordered);
if (nativeExecution && node.getCriteria().isEmpty()) {
// This maps to a NestedLoopJoin in Native engine. The NestedLoopJoin output is not
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel @feilong-liu Rebecca, Feilong, would you help take a first pass for this change? At a high level this seems reasonable, but I haven't been working in the optimizer code for quite some time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does only left join with no criteria produce nested loop join? What happens for inner join? Also, would it be enough to just change unordered to false for these cases, but otherwise keep the other things the same (does the stream distribution or partitioning change)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel : That's a good point. I was trying to contain this fix, but this change would be applicable for inner too. This change has come by in correlated queries that seem to use LEFT join predominantly.

But just changing unordered to false didn't work. I had to change the partitioning as well for the change to take effect. The addLocalExchanges seems to leverage distribution as well. More at #22585 (comment)

@aditi-pandit aditi-pandit changed the title [Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested joins [Do not review][Native] Do not carry over partitioning in StreamPropertyDerivations for nested loop joins Jul 31, 2024
{
private final boolean nativeExecution;

public ValidateStreamingAggregations()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this zero argument constructor?

{
private final boolean nativeExecution;

public ValidateStreamingJoins()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the zero argument constructor?

// is very small or much smaller than the right, then flip the join.
if (rightSize > leftSize || (isSizeBasedJoinDistributionTypeEnabled(context.getSession()) && (Double.isNaN(leftSize) || Double.isNaN(rightSize)) && isLeftSideSmall(joinNode, context))) {
rewrittenNode = createRuntimeSwappedJoinNode(joinNode, metadata, sqlParser, context.getLookup(), context.getSession(), context.getVariableAllocator(), context.getIdAllocator());
// This is never used for Prestissimo.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like that this assumption isn't enforced by the code. Could we instead pass the native execution config to this optimizer like we do with other optimizers (may need to inject FeaturesConfig into adaptivePlanOptimizers, but that's fine).

@aditi-pandit
Copy link
Copy Markdown
Contributor Author

@rschlussel : Thanks for your review.

I hear more recently that Velox team is planning to fix NestedLoopJoin. This PR will not be required if the Velox change happens.

This PR has wide changes in the optimizer. So I will resume work on it only if the Velox change will not happen.

@pedroerp
Copy link
Copy Markdown
Contributor

pedroerp commented Aug 2, 2024

This PR should address the NLJ issue in Velox, so this probably won't be needed:
facebookincubator/velox#10651

@aditi-pandit
Copy link
Copy Markdown
Contributor Author

Closing as the underlying NestedLoopJoin in Velox is fixed to maintain probe row ordering now. facebookincubator/velox#10651

@aditi-pandit aditi-pandit deleted the skip-stream-agg branch August 7, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants