Skip to content

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Jun 2, 2025

Description

We found that adding a distinct aggregation before a semi join on the build side can improve latency. This is probably due to the restriction that the build side of semi join is executed with single thread

checkArgument(buildContext.getDriverInstanceCount().orElse(1) == 1, "Expected local execution to not be parallel");

This PR works by adding a distinct aggregation below the semi join on the build side.

Motivation and Context

To improve latency of semi join

Impact

To improve latency of semi join

Test Plan

Unit tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add an optimizer to add distinct aggregation on build side of semi join

@feilong-liu feilong-liu requested a review from ZacBlanco June 2, 2025 23:20
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jun 2, 2025
@feilong-liu feilong-liu marked this pull request as draft June 2, 2025 23:20
@feilong-liu feilong-liu force-pushed the distinct_before_semijoin branch from 6d6ebc0 to 9ce3afa Compare June 3, 2025 04:09
@feilong-liu feilong-liu requested a review from kaikalur June 3, 2025 04:20
@feilong-liu feilong-liu force-pushed the distinct_before_semijoin branch from 9ce3afa to a6095f6 Compare June 3, 2025 04:58
@feilong-liu feilong-liu requested a review from kaikalur June 3, 2025 20:12
@feilong-liu feilong-liu force-pushed the distinct_before_semijoin branch from a6095f6 to 9f67498 Compare June 3, 2025 20:44
kaikalur
kaikalur previously approved these changes Jun 4, 2025
@feilong-liu feilong-liu force-pushed the distinct_before_semijoin branch from 9f67498 to 52819a6 Compare June 4, 2025 17:03
@feilong-liu feilong-liu requested a review from rschlussel June 4, 2025 17:04
@feilong-liu feilong-liu marked this pull request as ready for review June 4, 2025 17:04
private boolean nativeExecutionTypeRewriteEnabled;
private String expressionOptimizerName = DEFAULT_EXPRESSION_OPTIMIZER_NAME;
private boolean addExchangeBelowPartialAggregationOverGroupId;
private boolean addDistinctBelowSemiJoinBuild;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be on by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to run it for some time before turning it on as default.

@feilong-liu feilong-liu requested a review from jaystarshot June 4, 2025 21:11
import static com.facebook.presto.spi.plan.AggregationNode.isDistinct;
import static com.facebook.presto.spi.plan.AggregationNode.singleGroupingSet;
import static com.facebook.presto.sql.planner.plan.Patterns.semiJoin;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add small comment explaining the rule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu feilong-liu merged commit d5e76af into prestodb:master Jun 5, 2025
98 checks passed
@feilong-liu feilong-liu deleted the distinct_before_semijoin branch June 5, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants