Skip to content

feat(optimizer): Simplify COALESCE over equi-join keys based on join type#27250

Merged
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:coalesce-join-keys
Mar 6, 2026
Merged

feat(optimizer): Simplify COALESCE over equi-join keys based on join type#27250
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:coalesce-join-keys

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Mar 3, 2026

Summary

  • Adds SimplifyCoalesceOverJoinKeys optimizer rule that eliminates redundant COALESCE expressions over equi-join key pairs based on join type
  • For equi-join condition l.x = r.y:
    • LEFT JOIN: COALESCE(l.x, r.y) or COALESCE(r.y, l.x)l.x (left key guaranteed non-null)
    • RIGHT JOIN: COALESCE(l.x, r.y) or COALESCE(r.y, l.x)r.y (right key guaranteed non-null)
    • INNER JOIN: COALESCE(first, second)first (both sides non-null, pick first argument)
    • FULL JOIN: cannot simplify (either side may be null)
  • This is important for tool-generated queries that produce patterns like SELECT COALESCE(l.x, r.y) FROM l LEFT JOIN r ON l.x = r.y, where the COALESCE prevents bucketed join optimizations

Changes

  • SimplifyCoalesceOverJoinKeys.java: New optimizer rule matching ProjectNode over JoinNode, simplifying COALESCE expressions
  • FeaturesConfig.java: Added optimizer.simplify-coalesce-over-join-keys config (default: disabled)
  • SystemSessionProperties.java: Added simplify_coalesce_over_join_keys session property
  • PlanOptimizers.java: Registered the rule
  • TestSimplifyCoalesceOverJoinKeys.java: 14 unit tests covering all join types and edge cases
  • TestFeaturesConfig.java: Config validation tests
  • AbstractTestQueries.java: End-to-end tests with SQL queries

Fixes: #26984

Test plan

  • TestSimplifyCoalesceOverJoinKeys — 14 unit tests (all pass)
  • TestFeaturesConfig — config property tests (passes)
  • TestReorderJoins — verified no regression
  • End-to-end SQL tests in AbstractTestQueries — LEFT, RIGHT, INNER, FULL joins with COALESCE

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

  General Changes
  * Add optimizer rule ``SimplifyCoalesceOverJoinKeys`` that simplifies redundant ``COALESCE`` expressions over equi-join key pairs based on join type, enabling bucketed join optimizations for tool-generated queries. Controlled by the ``simplify_coalesce_over_join_keys`` session property (disabled by default).

@kaikalur kaikalur requested review from a team, elharo, feilong-liu and jaystarshot as code owners March 3, 2026 01:45
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 3, 2026

Reviewer's Guide

Introduces a new iterative optimizer rule that rewrites redundant two-argument COALESCE expressions over equi-join key pairs in ProjectNodes above JoinNodes based on join type, guarded by a configurable feature flag exposed via config and session properties, and covered by unit and end-to-end query tests.

Sequence diagram for applying SimplifyCoalesceOverJoinKeys during planning

sequenceDiagram
    participant Config as FeaturesConfig
    participant SSP as SystemSessionProperties
    participant Sess as Session
    participant Planner as PlanOptimizers
    participant Optimizer as IterativeOptimizer
    participant Rule as SimplifyCoalesceOverJoinKeys

    Config->>SSP: construct SystemSessionProperties(featuresConfig)
    SSP->>Sess: register system properties

    Sess->>Planner: create query session
    Planner->>Optimizer: build IterativeOptimizer with rule set {SimplifyCoalesceOverJoinKeys, ...}

    Optimizer->>Rule: isEnabled(session)
    Rule->>SSP: isSimplifyCoalesceOverJoinKeys(session)
    SSP-->>Rule: boolean enabled

    alt feature flag enabled
        Optimizer->>Rule: apply(projectNode, captures, context)
        Rule->>Rule: match ProjectNode over JoinNode via PATTERN
        Rule->>Rule: inspect JoinNode.getType() and criteria
        Rule->>Rule: trySimplifyCoalesce on COALESCE expressions
        Rule-->>Optimizer: Result with rewritten ProjectNode
    else feature flag disabled
        Rule-->>Optimizer: Result.empty
    end

    Optimizer-->>Planner: optimized plan
Loading

Class diagram for SimplifyCoalesceOverJoinKeys rule and related planner wiring

classDiagram
    class SimplifyCoalesceOverJoinKeys {
        +SimplifyCoalesceOverJoinKeys()
        +Pattern getPattern()
        +boolean isEnabled(Session session)
        +Result apply(ProjectNode project, Captures captures, Context context)
        -RowExpression trySimplifyCoalesce(RowExpression expression, JoinType joinType, Set leftVariables, Set rightVariables, Map leftToRight, Map rightToLeft)
        -static Capture JOIN
        -static Pattern PATTERN
    }

    class Rule {
        <<interface>>
        +Pattern getPattern()
        +boolean isEnabled(Session session)
        +Result apply(Node node, Captures captures, Context context)
    }

    class ProjectNode {
        +Assignments getAssignments()
        +PlanNode getSource()
        +Map getOutputVariables()
    }

    class JoinNode {
        +JoinType getType()
        +List getCriteria()
        +PlanNode getLeft()
        +PlanNode getRight()
        +List getOutputVariables()
    }

    class EquiJoinClause {
        +VariableReferenceExpression getLeft()
        +VariableReferenceExpression getRight()
    }

    class Assignments {
        +Map getMap()
        +static Builder builder()
    }

    class SpecialFormExpression {
        +Form getForm()
        +List getArguments()
        enum Form
    }

    class VariableReferenceExpression {
    }

    class JoinType {
        <<enum>>
        INNER
        LEFT
        RIGHT
        FULL
    }

    class FeaturesConfig {
        -boolean simplifyCoalesceOverJoinKeys
        +boolean isSimplifyCoalesceOverJoinKeys()
        +FeaturesConfig setSimplifyCoalesceOverJoinKeys(boolean simplifyCoalesceOverJoinKeys)
    }

    class SystemSessionProperties {
        +static String SIMPLIFY_COALESCE_OVER_JOIN_KEYS
        +static boolean isSimplifyCoalesceOverJoinKeys(Session session)
        -SystemSessionProperties(FeaturesConfig featuresConfig)
    }

    class PlanOptimizers {
        -PlanOptimizers(..., RuleStats ruleStats, StatsCalculator statsCalculator, EstimatedExchangesCostCalculator estimatedExchangesCostCalculator, Set rules)
    }

    class IterativeOptimizer {
        -Set rules
    }

    class Session {
        +Object getSystemProperty(String key, Class type)
    }

    SimplifyCoalesceOverJoinKeys ..|> Rule
    SimplifyCoalesceOverJoinKeys --> ProjectNode
    SimplifyCoalesceOverJoinKeys --> JoinNode
    SimplifyCoalesceOverJoinKeys --> Assignments
    SimplifyCoalesceOverJoinKeys --> EquiJoinClause
    SimplifyCoalesceOverJoinKeys --> SpecialFormExpression
    SimplifyCoalesceOverJoinKeys --> VariableReferenceExpression
    SimplifyCoalesceOverJoinKeys --> JoinType
    SimplifyCoalesceOverJoinKeys --> Session

    JoinNode --> EquiJoinClause
    JoinNode --> JoinType

    PlanOptimizers --> IterativeOptimizer
    IterativeOptimizer --> Rule
    IterativeOptimizer --> SimplifyCoalesceOverJoinKeys

    SystemSessionProperties --> FeaturesConfig
    SystemSessionProperties --> Session
    SimplifyCoalesceOverJoinKeys ..> SystemSessionProperties
Loading

File-Level Changes

Change Details Files
Add SimplifyCoalesceOverJoinKeys rule to rewrite COALESCE over equi-join keys in projections above joins.
  • Match ProjectNode whose source is a JoinNode using pattern captures.
  • Gate the rule via a new isSimplifyCoalesceOverJoinKeys session property and early-return for FULL joins and joins without equi-join criteria.
  • Collect left/right output variables and build left-to-right and right-to-left maps from EquiJoinClause criteria.
  • Scan project assignments and for each two-argument COALESCE over join key variables, decide replacement based on join type (LEFT→left key, RIGHT→right key, INNER→first argument).
  • Rebuild ProjectNode with updated Assignments only when at least one expression is simplified.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SimplifyCoalesceOverJoinKeys.java
Wire the new rule into the optimizer pipeline and configuration system.
  • Add simplifyCoalesceOverJoinKeys boolean field to FeaturesConfig with getter, default true, and @config mapping optimizer.simplify-coalesce-over-join-keys.
  • Expose SIMPLIFY_COALESCE_OVER_JOIN_KEYS session property in SystemSessionProperties, with default from FeaturesConfig and accessor isSimplifyCoalesceOverJoinKeys(Session).
  • Register SimplifyCoalesceOverJoinKeys in PlanOptimizers alongside other single-rule IterativeOptimizer rules.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Add planner rule unit tests verifying COALESCE simplification behavior across join types and edge cases.
  • Construct synthetic plans with Project over Join for LEFT, RIGHT, and INNER joins and assert COALESCE over join keys rewrites to the expected variable depending on argument order and join type.
  • Verify the rule does not fire for FULL joins, joins without equi-join criteria, COALESCE over non-key columns, three-argument COALESCE, or when the session flag is disabled.
  • Test multiple join keys, mixed COALESCE and identity projections, and that plan patterns match expected join criteria and projections.
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSimplifyCoalesceOverJoinKeys.java
Extend configuration tests to cover the new optimizer config property.
  • Update default FeaturesConfig expectations to include simplifyCoalesceOverJoinKeys=true.
  • Add explicit property mappings for optimizer.simplify-coalesce-over-join-keys and verify it sets simplifyCoalesceOverJoinKeys to false when configured.
  • Ensure the new config participates correctly in existing FeaturesConfig round-trip tests.
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
Add end-to-end SQL tests validating semantic equivalence when the rule is enabled or disabled.
  • Create sessions with simplify_coalesce_over_join_keys system property explicitly enabled and disabled.
  • Run queries with COALESCE over equi-join keys for LEFT, RIGHT, INNER, and FULL joins and assert identical results between sessions, including cases with additional projected columns.
  • Add a test for JOIN USING (regionkey) to cover automatically generated COALESCE over join keys from USING semantics.
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java

Assessment against linked issues

Issue Objective Addressed Explanation
#26984 Implement an optimizer/planner transformation that simplifies expressions of the form COALESCE(l.x, r.y) (or COALESCE(r.y, l.x)) in queries like SELECT COALESCE(l.x, r.y) FROM l LEFT JOIN r ON l.x = r.y to l.x, so that such redundant COALESCE does not degrade join planning.
#26984 Integrate this simplification into the optimization pipeline (behind a feature flag/session property) and add tests to verify the transformation for LEFT JOIN (and not incorrectly for unsupported cases).

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

…type

Add SimplifyCoalesceOverJoinKeys optimizer rule that eliminates redundant
COALESCE expressions over equi-join key pairs. For equi-join condition
l.x = r.y, COALESCE(l.x, r.y) can be simplified based on join type:

- LEFT JOIN: always l.x (left key guaranteed non-null)
- RIGHT JOIN: always r.y (right key guaranteed non-null)
- INNER JOIN: first argument (both non-null)
- FULL JOIN: cannot simplify (either side may be null)

This optimization is important for tool-generated queries that produce
patterns like SELECT COALESCE(l.x, r.y) FROM l LEFT JOIN r ON l.x = r.y,
where the COALESCE prevents bucketed join optimizations.

Fixes: prestodb#26984
@kaikalur kaikalur force-pushed the coalesce-join-keys branch from 7777f53 to 4275c21 Compare March 3, 2026 02:52
@kaikalur
Copy link
Copy Markdown
Contributor Author

kaikalur commented Mar 3, 2026

Friendly ping @jaystarshot @feilong-liu @elharo — CI is all green on this PR. Would appreciate a review when you get a chance. Thanks!

Comment on lines +63 to +64
private static final Pattern<ProjectNode> PATTERN = project()
.with(source().matching(join().capturedAs(JOIN)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I remember we can put the join type check within the Pattern here too?

@kaikalur kaikalur merged commit 44300f7 into prestodb:master Mar 6, 2026
119 of 123 checks passed
garimauttam pushed a commit to garimauttam/presto that referenced this pull request Mar 9, 2026
…type (prestodb#27250)

## Summary
- Adds `SimplifyCoalesceOverJoinKeys` optimizer rule that eliminates
redundant `COALESCE` expressions over equi-join key pairs based on join
type
- For equi-join condition `l.x = r.y`:
- **LEFT JOIN**: `COALESCE(l.x, r.y)` or `COALESCE(r.y, l.x)` → `l.x`
(left key guaranteed non-null)
- **RIGHT JOIN**: `COALESCE(l.x, r.y)` or `COALESCE(r.y, l.x)` → `r.y`
(right key guaranteed non-null)
- **INNER JOIN**: `COALESCE(first, second)` → `first` (both sides
non-null, pick first argument)
  - **FULL JOIN**: cannot simplify (either side may be null)
- This is important for tool-generated queries that produce patterns
like `SELECT COALESCE(l.x, r.y) FROM l LEFT JOIN r ON l.x = r.y`, where
the COALESCE prevents bucketed join optimizations

## Changes
- **`SimplifyCoalesceOverJoinKeys.java`**: New optimizer rule matching
`ProjectNode` over `JoinNode`, simplifying COALESCE expressions
- **`FeaturesConfig.java`**: Added
`optimizer.simplify-coalesce-over-join-keys` config (default: disabled)
- **`SystemSessionProperties.java`**: Added
`simplify_coalesce_over_join_keys` session property
- **`PlanOptimizers.java`**: Registered the rule
- **`TestSimplifyCoalesceOverJoinKeys.java`**: 14 unit tests covering
all join types and edge cases
- **`TestFeaturesConfig.java`**: Config validation tests
- **`AbstractTestQueries.java`**: End-to-end tests with SQL queries

Fixes: prestodb#26984

## Test plan
- [x] `TestSimplifyCoalesceOverJoinKeys` — 14 unit tests (all pass)
- [x] `TestFeaturesConfig` — config property tests (passes)
- [x] `TestReorderJoins` — verified no regression
- [x] End-to-end SQL tests in `AbstractTestQueries` — LEFT, RIGHT,
INNER, FULL joins with COALESCE

## Contributor checklist

- [x] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [x] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [x] Adequate tests were added if applicable.
- [x] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

  General Changes
  * Add optimizer rule ``SimplifyCoalesceOverJoinKeys`` that simplifies redundant ``COALESCE`` expressions over equi-join key pairs based on join type, enabling bucketed join optimizations for tool-generated queries. Controlled by the ``simplify_coalesce_over_join_keys`` session property (disabled by default).
```
@ethanyzhang ethanyzhang added the from:Meta PR from Meta label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize COALESCE(l.x, r.y) from l left join r on l.x = r.y to l.x

3 participants