Skip to content

feat(optimizer): Enhance PayloadJoinOptimizer with null-check skipping, chain flattening, and LOJ reordering#27404

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
kaikalur:payload-join-skip-null-checks
Mar 30, 2026
Merged

feat(optimizer): Enhance PayloadJoinOptimizer with null-check skipping, chain flattening, and LOJ reordering#27404
feilong-liu merged 1 commit intoprestodb:masterfrom
kaikalur:payload-join-skip-null-checks

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Mar 22, 2026

Summary

Improves the PayloadJoinOptimizer in three areas:

1. Skip null checks for non-null join keys

  • When a join key has an upstream WHERE key IS NOT NULL predicate, the payload rejoin now uses a direct equality predicate instead of generating IS_NULL projections and COALESCE comparisons
  • Detects NOT(IS_NULL(var)) patterns in FilterNode predicates within the scan-filter-project tree
  • Reduces plan complexity and runtime overhead for queries with non-null join key guarantees

2. Flatten LOJ chains through intervening nodes (pre-pass)

  • Removes identity projections (e.g., from subqueries) that break LOJ chains, exposing the full chain to the optimizer
  • Hoists cross joins from within LOJ chains to above them, so the LOJ chain remains contiguous
  • Correctly handles cases where cross join or projection columns are used as join keys in subsequent LOJs

3. Reorder LOJ chains to maximize optimization

  • Classifies LOJs as "base-keyed" (keys from base table) or "dependent" (keys from another LOJ's output)
  • Reorders so base-keyed LOJs come first in the chain, maximizing the number of joins the payload optimization can cover
  • Only triggers when chain has 3+ joins and reordering would move at least 2 base-keyed joins together

Release Notes

No user-facing changes. Internal optimization improvements to the PayloadJoinOptimizer.

Test plan

  • Added e2e integration tests in AbstractTestDistributedQueries comparing results with optimization enabled vs disabled
  • Tests cover null-check skipping: both keys non-null, single key non-null, IS NOT NULL combined with other predicates
  • Tests cover chain flattening: identity projections from subqueries, cross joins between LOJs, non-identity projections computing join keys, cross join columns used as join keys
  • Tests cover LOJ reordering: dependent LOJ blocking base-keyed LOJ, multiple base-keyed LOJs separated by dependent ones
  • Existing payload join tests (testPayloadJoinApplicability, testPayloadJoinCorrectness) continue to pass
  • Full TestTpchDistributedQueries and TestLocalQueries pass
  • Verified compilation and checkstyle pass on presto-main-base and presto-tests

@kaikalur kaikalur requested review from a team, elharo, feilong-liu and jaystarshot as code owners March 22, 2026 04:57
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 22, 2026

Reviewer's Guide

Optimizes payload join rejoin predicates by detecting non-null guarantees on join keys in scan-filter-project trees and simplifies the generated join predicates accordingly, plus adds regression coverage for the new optimization behavior.

Sequence diagram for payload join optimization with non-null join keys

sequenceDiagram
    participant Planner
    participant PayloadJoinOptimizer
    participant Rewriter
    participant JoinContext

    Planner->>PayloadJoinOptimizer: optimize(plan)
    PayloadJoinOptimizer->>Rewriter: rewriteScanFilterProject(planNode, context)
    Rewriter->>Rewriter: extractNonNullVariablesFromScanFilterProject(planNode, joinKeys)
    Rewriter->>JoinContext: addNonNullKeys(nonNullVars)

    Planner->>PayloadJoinOptimizer: optimize join
    PayloadJoinOptimizer->>Rewriter: transformJoin(keysNode, context)
    Rewriter->>JoinContext: getNonNullKeys()
    Rewriter->>Rewriter: build joinPredicateBuilder
    Rewriter->>Rewriter: if key in nonNullKeys then equalityPredicate(newVar, var)
    Rewriter->>Rewriter: else IS_NULL projection and COALESCE comparisons
    Rewriter-->>Planner: new JoinNode with joinCriteria
Loading

Class diagram for updated PayloadJoinOptimizer and JoinContext

classDiagram
    class PayloadJoinOptimizer {
    }

    class Rewriter {
        +rewriteScanFilterProject(planNode, context) PlanNode
        -transformJoin(keysNode, context) PlanNode
        -supportedJoinKeyTypes(joinKeys) boolean
        -extractNonNullVariablesFromScanFilterProject(node, joinKeys) Set~VariableReferenceExpression~
        -extractNonNullVariablesRecursive(node, joinKeys, functionResolution, result) void
    }

    class JoinContext {
        -Set~VariableReferenceExpression~ joinKeys
        -Map~VariableReferenceExpression, VariableReferenceExpression~ joinKeyMap
        -Map~VariableReferenceExpression, RowExpression~ projectionsToPush
        -Set~VariableReferenceExpression~ nonNullKeys
        -int numJoins
        -PlanNode payloadNode
        +reset() void
        +getNumJoins() int
        +needsPayloadRejoin() boolean
        +getNonNullKeys() Set~VariableReferenceExpression~
        +addNonNullKeys(keys) void
    }

    PayloadJoinOptimizer o-- Rewriter
    PayloadJoinOptimizer o-- JoinContext
    Rewriter --> JoinContext
Loading

Flow diagram for extracting non-null join key variables from scan-filter-project

flowchart TD
    A[Start extractNonNullVariablesFromScanFilterProject] --> B[Initialize empty nonNullVars set]
    B --> C[Call extractNonNullVariablesRecursive with node, joinKeys, functionResolution, result]

    C --> D{Node is FilterNode?}
    D -->|Yes| E[Get predicate from FilterNode]
    E --> F[Extract conjuncts]
    F --> G[For each conjunct]
    G --> H{Conjunct is CallExpression and NOT function?}
    H -->|Yes| I{Argument is IS_NULL SpecialFormExpression?}
    I -->|Yes| J{IS_NULL argument is VariableReferenceExpression?}
    J -->|Yes| K{Variable in joinKeys?}
    K -->|Yes| L[Add variable to result]
    K -->|No| M[Ignore]
    J -->|No| M
    I -->|No| M
    H -->|No| M
    M --> N[Done with conjuncts]
    N --> O[Recurse on FilterNode source]

    D -->|No, node is ProjectNode| P[Recurse on ProjectNode source]
    D -->|No, node is TableScanNode| Q[Stop recursion]

    O --> R[Return]
    P --> R
    Q --> R
    R --> S[Build and return nonNullVars set]
    S --> T[End]
Loading

File-Level Changes

Change Details Files
Track join keys proven non-null in scan-filter-project subtree and propagate this information through JoinContext
  • On rewrite of scan-filter-project, extract variables with IS NOT NULL predicates that are also join keys
  • Introduce JoinContext.nonNullKeys field with getter, mutator, and reset logic
  • Record detected non-null join keys into the JoinContext for later use during join transformation
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PayloadJoinOptimizer.java
Use direct equality for non-null join keys in payload rejoin and retain existing IS_NULL + COALESCE logic only for nullable keys
  • Replace separate builders for null and COALESCE comparisons with a unified joinPredicateBuilder
  • For join keys marked non-null, build simple equality predicates between original and rejoin variables
  • For other join keys, keep generating IS_NULL projection variables and both null and COALESCE equality comparisons
  • Construct joinCriteria directly from the accumulated joinPredicateBuilder predicates
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PayloadJoinOptimizer.java
Implement helper to detect NOT(IS_NULL(var)) patterns in FilterNode predicates in scan-filter-project trees
  • Add extractNonNullVariablesFromScanFilterProject helper that walks FilterNode and ProjectNode chain
  • Within filters, scan conjuncts for NOT(IS_NULL()) CallExpression/SpecialFormExpression patterns
  • Collect only those variables that are also join keys into the non-null set
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PayloadJoinOptimizer.java
Add distributed query tests validating that payload join optimization skips null checks for non-null keys while preserving correctness
  • Introduce testPayloadJoinSkipsNullChecksForNonNullKeys in AbstractTestDistributedQueries
  • Set up sessions with payload join optimization enabled and disabled while fixing redundant cast removal off
  • Run representative queries with WHERE key IS NOT NULL (both keys, single key, and combined with other predicates) and compare EXPLAIN plans and query results between sessions
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestDistributedQueries.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The extractNonNullVariablesRecursive logic only detects a very specific NOT(IS_NULL(var)) shape; consider handling equivalent patterns (e.g., additional casts/aliases, redundant boolean wrappers) or at least documenting that only this exact form is recognized so future changes don’t break the optimization silently.
  • The new test asserts optimization by comparing entire EXPLAIN output strings, which can be brittle as other planner rules evolve; it may be more robust to assert for the presence/absence of specific join predicates or patterns instead of full-plan string inequality.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `extractNonNullVariablesRecursive` logic only detects a very specific `NOT(IS_NULL(var))` shape; consider handling equivalent patterns (e.g., additional casts/aliases, redundant boolean wrappers) or at least documenting that only this exact form is recognized so future changes don’t break the optimization silently.
- The new test asserts optimization by comparing entire `EXPLAIN` output strings, which can be brittle as other planner rules evolve; it may be more robust to assert for the presence/absence of specific join predicates or patterns instead of full-plan string inequality.

## Individual Comments

### Comment 1
<location path="presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestDistributedQueries.java" line_range="1473-1470" />
<code_context>
+    @Test
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the plan assertion to verify that null-handling logic is actually removed, not just that the plan changed

The current test only verifies that the optimized and non-optimized EXPLAIN plans differ, which doesn’t prove that the null-handling (`IS NULL`/`COALESCE`) was actually removed. To better validate the intended behavior, assert on the explain text: e.g., confirm the non-optimized plan includes `IS NULL` (or `COALESCE`) in the join predicate while the optimized plan does not. Simple string checks like `assertTrue(explainNoOpt.contains("IS NULL"));` and `assertFalse(explainWithOpt.contains("IS NULL"));` would more directly ensure we’re testing null-check removal rather than an unrelated plan change.

Suggested implementation:

```java
        assertNotEquals(explainWithOpt, explainNoOpt);

        // Strengthen the assertion to verify that null-handling logic is actually removed
        assertTrue(
                explainNoOpt.contains("IS NULL") || explainNoOpt.contains("COALESCE"),
                "Expected non-optimized plan to contain null-handling in the join predicate");
        assertFalse(
                explainWithOpt.contains("IS NULL") || explainWithOpt.contains("COALESCE"),
                "Expected optimized plan to have null-handling removed from the join predicate");

```

I assumed the existing test already computes two EXPLAIN plans as `String explainNoOpt` and `String explainWithOpt`, and already had a line `assertNotEquals(explainWithOpt, explainNoOpt);`. If the variable names differ, or if the inequality assertion is written differently, adjust the SEARCH section accordingly to match the existing code.

If the EXPLAIN text does not literally include `"IS NULL"` or `"COALESCE"` in your specific query, update those substrings to whatever expression is actually used for null-handling in the join predicate so the strengthened assertions match the real plan text.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -1470,6 +1470,43 @@ public void testPayloadJoinCorrectness()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Strengthen the plan assertion to verify that null-handling logic is actually removed, not just that the plan changed

The current test only verifies that the optimized and non-optimized EXPLAIN plans differ, which doesn’t prove that the null-handling (IS NULL/COALESCE) was actually removed. To better validate the intended behavior, assert on the explain text: e.g., confirm the non-optimized plan includes IS NULL (or COALESCE) in the join predicate while the optimized plan does not. Simple string checks like assertTrue(explainNoOpt.contains("IS NULL")); and assertFalse(explainWithOpt.contains("IS NULL")); would more directly ensure we’re testing null-check removal rather than an unrelated plan change.

Suggested implementation:

        assertNotEquals(explainWithOpt, explainNoOpt);

        // Strengthen the assertion to verify that null-handling logic is actually removed
        assertTrue(
                explainNoOpt.contains("IS NULL") || explainNoOpt.contains("COALESCE"),
                "Expected non-optimized plan to contain null-handling in the join predicate");
        assertFalse(
                explainWithOpt.contains("IS NULL") || explainWithOpt.contains("COALESCE"),
                "Expected optimized plan to have null-handling removed from the join predicate");

I assumed the existing test already computes two EXPLAIN plans as String explainNoOpt and String explainWithOpt, and already had a line assertNotEquals(explainWithOpt, explainNoOpt);. If the variable names differ, or if the inequality assertion is written differently, adjust the SEARCH section accordingly to match the existing code.

If the EXPLAIN text does not literally include "IS NULL" or "COALESCE" in your specific query, update those substrings to whatever expression is actually used for null-handling in the join predicate so the strengthened assertions match the real plan text.

@kaikalur kaikalur changed the title Skip null checks in PayloadJoinOptimizer for non-null join keys feat(optimizer): Skip null checks in PayloadJoinOptimizer for non-null join keys Mar 22, 2026
@kaikalur kaikalur force-pushed the payload-join-skip-null-checks branch from ca99a0f to 9a5fbb3 Compare March 22, 2026 17:17
@kaikalur kaikalur changed the title feat(optimizer): Skip null checks in PayloadJoinOptimizer for non-null join keys feat(optimizer): Enhance PayloadJoinOptimizer with null-check skipping, chain flattening, and LOJ reordering Mar 22, 2026
@kaikalur kaikalur force-pushed the payload-join-skip-null-checks branch 3 times, most recently from cc5071f to bc767b0 Compare March 22, 2026 21:00
@kaikalur
Copy link
Copy Markdown
Contributor Author

@feilong-liu Could you take a look at this PR when you get a chance? It enhances the PayloadJoinOptimizer with null-check skipping for non-null join keys, chain flattening through intervening projections/cross joins, and LOJ reordering. Thanks!

@kaikalur kaikalur force-pushed the payload-join-skip-null-checks branch 9 times, most recently from 785e1d2 to 95d6ff6 Compare March 27, 2026 21:16
@kaikalur kaikalur force-pushed the payload-join-skip-null-checks branch from 95d6ff6 to 347142e Compare March 28, 2026 03:21
@kaikalur
Copy link
Copy Markdown
Contributor Author

@feilong-liu All CI checks are green — could you review and merge this when you get a chance? Thanks!

@feilong-liu feilong-liu merged commit ccae825 into prestodb:master Mar 30, 2026
114 of 116 checks passed
bibith4 pushed a commit to bibith4/presto that referenced this pull request Apr 1, 2026
…g, chain flattening, and LOJ reordering (prestodb#27404)

## Summary
Improves the PayloadJoinOptimizer in three areas:

**1. Skip null checks for non-null join keys**
- When a join key has an upstream `WHERE key IS NOT NULL` predicate, the
payload rejoin now uses a direct equality predicate instead of
generating `IS_NULL` projections and `COALESCE` comparisons
- Detects `NOT(IS_NULL(var))` patterns in FilterNode predicates within
the scan-filter-project tree
- Reduces plan complexity and runtime overhead for queries with non-null
join key guarantees

**2. Flatten LOJ chains through intervening nodes (pre-pass)**
- Removes identity projections (e.g., from subqueries) that break LOJ
chains, exposing the full chain to the optimizer
- Hoists cross joins from within LOJ chains to above them, so the LOJ
chain remains contiguous
- Correctly handles cases where cross join or projection columns are
used as join keys in subsequent LOJs

**3. Reorder LOJ chains to maximize optimization**
- Classifies LOJs as "base-keyed" (keys from base table) or "dependent"
(keys from another LOJ's output)
- Reorders so base-keyed LOJs come first in the chain, maximizing the
number of joins the payload optimization can cover
- Only triggers when chain has 3+ joins and reordering would move at
least 2 base-keyed joins together

## Release Notes
No user-facing changes. Internal optimization improvements to the
PayloadJoinOptimizer.

## Test plan
- [x] Added e2e integration tests in `AbstractTestDistributedQueries`
comparing results with optimization enabled vs disabled
- [x] Tests cover null-check skipping: both keys non-null, single key
non-null, IS NOT NULL combined with other predicates
- [x] Tests cover chain flattening: identity projections from
subqueries, cross joins between LOJs, non-identity projections computing
join keys, cross join columns used as join keys
- [x] Tests cover LOJ reordering: dependent LOJ blocking base-keyed LOJ,
multiple base-keyed LOJs separated by dependent ones
- [x] Existing payload join tests (`testPayloadJoinApplicability`,
`testPayloadJoinCorrectness`) continue to pass
- [x] Full `TestTpchDistributedQueries` and `TestLocalQueries` pass
- [x] Verified compilation and checkstyle pass on `presto-main-base` and
`presto-tests`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants