Skip to content

fix(planner): Fix failing isDistinct for equivalent variables for logical properties#27241

Merged
aaneja merged 1 commit intoprestodb:masterfrom
aaneja:earlyOutFix
Mar 10, 2026
Merged

fix(planner): Fix failing isDistinct for equivalent variables for logical properties#27241
aaneja merged 1 commit intoprestodb:masterfrom
aaneja:earlyOutFix

Conversation

@aaneja
Copy link
Copy Markdown
Contributor

@aaneja aaneja commented Mar 2, 2026

Description, Motivation and Context

When using logical properties and the Iterative optimizer, the KeyProperty for a Group may not always use the top-most output variables. It may be using a equivalent variable; which is why we should be normalized the test key using the equivalence classes before doing the comparison

Impact

The TransformDistinctInnerJoinToRightEarlyOutJoin rule was stuck in a loop for TPCDS Q95 when constraints are turned on

The underlying Group node's Key's do not directly refer to the newly added aggregation node's output variables,
so the isDistinct check was failing causing the rule to re-apply and get stuck

Test Plan

New test added

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== NO RELEASE NOTE ==

Summary by Sourcery

Bug Fixes:

  • Correct key requirement satisfaction check to use the normalized key requirement instead of the original key.

Summary by Sourcery

Ensure logical key distinctness checks respect normalized equivalence classes and expand logical property testing to cover key normalization behavior.

Bug Fixes:

  • Correct key requirement satisfaction to use normalized keys derived from equivalence classes when evaluating distinctness.

Enhancements:

  • Refactor RuleAssert to allow custom logical property assertions via a consumer-based helper in addition to exact equality checks.

Tests:

  • Add a logical property propagation test that validates key normalization across equivalent join variables and aggregations.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 2, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 2, 2026

Reviewer's Guide

Refines logical key requirement satisfaction to account for normalized equivalent variables and extends the logical properties test harness to support custom assertions, adding a regression test that validates key normalization across joined tables and aggregation.

Sequence diagram for key requirement satisfaction during join optimization

sequenceDiagram
    actor Optimizer
    participant TransformDistinctInnerJoinToRightEarlyOutJoin as TransformDistinctInnerJoinToRightEarlyOutJoin
    participant LogicalPropertiesImpl as LogicalPropertiesImpl
    participant Group as Group
    participant KeyProperty as KeyProperty
    participant EquivalenceClassProperty as EquivalenceClassProperty
    participant MaxCardProperty as MaxCardProperty

    Optimizer->>TransformDistinctInnerJoinToRightEarlyOutJoin: apply(ruleContext)
    TransformDistinctInnerJoinToRightEarlyOutJoin->>LogicalPropertiesImpl: isDistinct(group)
    LogicalPropertiesImpl->>Group: getMaxCardProperty()
    Group-->>LogicalPropertiesImpl: maxCardProperty
    LogicalPropertiesImpl->>MaxCardProperty: isAtMostOne()
    MaxCardProperty-->>LogicalPropertiesImpl: false

    LogicalPropertiesImpl->>Group: getKeyProperty()
    Group-->>LogicalPropertiesImpl: keyProperty
    LogicalPropertiesImpl->>Group: getEquivalenceClassProperty()
    Group-->>LogicalPropertiesImpl: equivalenceClassProperty

    LogicalPropertiesImpl->>LogicalPropertiesImpl: getNormalizedKey(keyRequirement, equivalenceClassProperty)
    LogicalPropertiesImpl->>EquivalenceClassProperty: normalizeKey(keyRequirement)
    EquivalenceClassProperty-->>LogicalPropertiesImpl: Optional_Key

    LogicalPropertiesImpl->>KeyProperty: satisfiesKeyRequirement(normalizedKey)
    KeyProperty-->>LogicalPropertiesImpl: true

    LogicalPropertiesImpl-->>TransformDistinctInnerJoinToRightEarlyOutJoin: isDistinct = true
    TransformDistinctInnerJoinToRightEarlyOutJoin-->>Optimizer: rule not reapplied, no infinite loop
Loading

Class diagram for updated logical key requirement satisfaction

classDiagram
    class LogicalPropertiesImpl {
        - keyProperty KeyProperty
        - equivalenceClassProperty EquivalenceClassProperty
        - maxCardProperty MaxCardProperty
        + keyRequirementSatisfied(keyRequirement Key) boolean
        + getNormalizedKey(keyRequirement Key, equivalenceClassProperty EquivalenceClassProperty) Optional_Key
    }

    class KeyProperty {
        + satisfiesKeyRequirement(keyRequirement Key) boolean
    }

    class EquivalenceClassProperty {
        + normalizeKey(keyRequirement Key) Optional_Key
    }

    class MaxCardProperty {
        + isAtMostOne() boolean
    }

    class Key
    class Optional_Key

    LogicalPropertiesImpl --> KeyProperty
    LogicalPropertiesImpl --> EquivalenceClassProperty
    LogicalPropertiesImpl --> MaxCardProperty
    LogicalPropertiesImpl --> Optional_Key
    KeyProperty --> Key
    EquivalenceClassProperty --> Key
    EquivalenceClassProperty --> Optional_Key
    MaxCardProperty --> LogicalPropertiesImpl
Loading

File-Level Changes

Change Details Files
Fix key requirement satisfaction to use normalized keys based on equivalence classes before checking distinctness.
  • Simplified keyRequirementSatisfied to return true only when a normalized key exists and is satisfied by the current key property.
  • Replaced the previous optional branching logic with a filter + isPresent pipeline to ensure the check is always performed on the normalized key.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/properties/LogicalPropertiesImpl.java
Extend RuleAssert to support arbitrary logical property assertions and keep backward compatibility.
  • Introduced assertLogicalProperties method that applies a LogicalProperties consumer to the root group properties after running the rule.
  • Refactored existing matches(LogicalProperties) to delegate to assertLogicalProperties while preserving the previous equality-based assertion behavior.
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/test/RuleAssert.java
Add regression test ensuring key normalization correctly recognizes equivalent variables as distinct keys.
  • Imported TestNG assertTrue to support direct assertions on logical properties.
  • Built a plan with table scans on customer, orders, and lineitem joined on custkey/orderkey and a single-group AggregationNode.
  • Used assertLogicalProperties to verify isDistinct returns true for the grouping key and its equivalent variables derived via equivalence classes.
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestLogicalPropertyPropagation.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@aditi-pandit
Copy link
Copy Markdown
Contributor

@aaneja : Do you need any help with this ? Fixing TPC-DS Q95 is important.

The fix looks plausible, but it would be great if we have some tests to validate.

@aaneja aaneja changed the title Fix bug in LogicalPropertiesImpl fix(planner): Fix failing isDistinct for equivalent variables for logical properties Mar 5, 2026
@aaneja
Copy link
Copy Markdown
Contributor Author

aaneja commented Mar 5, 2026

@aaneja : Do you need any help with this ? Fixing TPC-DS Q95 is important.

The fix looks plausible, but it would be great if we have some tests to validate.

Was waiting for an internal run to finish - https://github.ibm.com/lakehouse/presto/issues/4435#issuecomment-178835976

I've added a new test that emulates the internal sub-plan similar to Q95.
For reference here is the sub-plan where it was failing -

Before

[3:30] [812] single aggregation over (ca_state, expr_108, unique, web_company_name, ws_order_number, ws_ship_addr_sk, ws_ship_date_sk, ws_web_site_sk). Computing [{}]
    [3:30] [2474] join (INNER)(REPLICATED), Equi-join condition([wr_order_number = ws_order_number]), Filter (Optional.empty), Outputs([ws_ship_date_sk, ws_ship_addr_sk, ws_web_site_sk, ws_order_number, ca_state, web_company_name, expr_108, unique]):
        [3:30] [2471] join (INNER)(REPLICATED), Equi-join condition([wr_order_number = ws_order_number_169]), Filter (Optional[NOT_EQUAL(ws_warehouse_sk_129, ws_warehouse_sk_167)]), Outputs([wr_order_number]):
   

After

3:30] [2561] single aggregation over (ca_state, expr_108, unique, web_company_name, ws_order_number, ws_ship_addr_sk, ws_ship_date_sk, ws_web_site_sk). Computing [{}]
    [3:30] [2560] join (INNER)(REPLICATED), Equi-join condition([wr_order_number = ws_order_number]), Filter (Optional.empty), Outputs([ws_ship_date_sk, ws_ship_addr_sk, ws_web_site_sk, ws_order_number, ca_state, web_company_name, expr_108, unique]):
        [3:30] [2559] single aggregation over (wr_order_number). Computing [{}]
            [3:30] [2471] join (INNER)(REPLICATED), Equi-join condition([wr_order_number = ws_order_number_169]), Filter (Optional[NOT_EQUAL(ws_warehouse_sk_129, ws_warehouse_sk_167)]), Outputs([wr_order_number]):

The 2559 agg node is added by the TransformDistinctInnerJoinToRightEarlyOutJoin rule.
After the rule is run, when the optimzer re-triggers a check for this rule, it calls isDistinct(wr_order_number) we were getting a false
This is because the JOINs (not shown above) add the equivalences - wr_order_number = ws_order_number_169 and ws_order_number_131 = wr_order_number and the Key we build refers to the ws_order_number_131 directly instead of wr_order_number

@aaneja aaneja marked this pull request as ready for review March 5, 2026 11:26
@aaneja aaneja requested review from a team, feilong-liu and jaystarshot as code owners March 5, 2026 11:26
@prestodb-ci prestodb-ci requested review from a team, anandamideShakyan and bibith4 and removed request for a team March 5, 2026 11:26
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/test/RuleAssert.java" line_range="193-196" />
<code_context>
     }

-    public void matches(LogicalProperties expectedLogicalProperties)
+    public void assertLogicalProperties(Consumer<LogicalProperties> matcher)
     {
         RuleApplication ruleApplication = applyRule();
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Add an explicit assertion when logical properties for the root group are missing to avoid obscure failures in matchers

The helper currently assumes `getLogicalProperties(...).get()` is always present; if logical properties are missing for the root group, tests will fail with a `NoSuchElementException`/`null` rather than a clear assertion. Please assert that the Optional is present (with a descriptive failure message) before invoking `matcher.accept(...)` so missing logical properties are reported explicitly.

Suggested implementation:

```java
import java.util.Map;
import java.util.Optional;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.stream.Stream;

import static org.testng.Assert.assertTrue;

```

```java
        Optional<LogicalProperties> rootNodeLogicalProperties = ruleApplication.getMemo()
                .getLogicalProperties(ruleApplication.getMemo().getRootGroup());

        assertTrue(
                rootNodeLogicalProperties.isPresent(),
                "Logical properties are missing for the root group; ensure they are computed before asserting.");

        matcher.accept(rootNodeLogicalProperties.get());

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +193 to 196
public void assertLogicalProperties(Consumer<LogicalProperties> matcher)
{
RuleApplication ruleApplication = applyRule();
TypeProvider types = ruleApplication.types;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Add an explicit assertion when logical properties for the root group are missing to avoid obscure failures in matchers

The helper currently assumes getLogicalProperties(...).get() is always present; if logical properties are missing for the root group, tests will fail with a NoSuchElementException/null rather than a clear assertion. Please assert that the Optional is present (with a descriptive failure message) before invoking matcher.accept(...) so missing logical properties are reported explicitly.

Suggested implementation:

import java.util.Map;
import java.util.Optional;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.stream.Stream;

import static org.testng.Assert.assertTrue;
        Optional<LogicalProperties> rootNodeLogicalProperties = ruleApplication.getMemo()
                .getLogicalProperties(ruleApplication.getMemo().getRootGroup());

        assertTrue(
                rootNodeLogicalProperties.isPresent(),
                "Logical properties are missing for the root group; ensure they are computed before asserting.");

        matcher.accept(rootNodeLogicalProperties.get());

aditi-pandit
aditi-pandit previously approved these changes Mar 6, 2026
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaneja

tdcmeehan
tdcmeehan previously approved these changes Mar 9, 2026
where we were looking up the keyProperty on
the input key while we should have been looking
up based on the normalized key
@aaneja aaneja dismissed stale reviews from tdcmeehan and aditi-pandit via 4ac8fac March 10, 2026 02:34
@aaneja aaneja merged commit 9970fa3 into prestodb:master Mar 10, 2026
117 of 121 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants