Skip to content

fix: Guard JoinPrefilter against non-deterministic expressions (#27312)#27312

Merged
singcha merged 1 commit intoprestodb:masterfrom
adheer-araokar:export-D95575024
Mar 19, 2026
Merged

fix: Guard JoinPrefilter against non-deterministic expressions (#27312)#27312
singcha merged 1 commit intoprestodb:masterfrom
adheer-araokar:export-D95575024

Conversation

@adheer-araokar
Copy link
Copy Markdown
Contributor

@adheer-araokar adheer-araokar commented Mar 11, 2026

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., rand() < 0.1 from TABLESAMPLE BERNOULLI), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

  1. PlannerUtils.java: Adds isDeterministicScanFilterProject() which recursively
    checks that all filter predicates and project assignments in a scan-filter-project
    subtree are deterministic, using RowExpressionDeterminismEvaluator.

  2. JoinPrefilter.java: Adds the determinism check to the visitJoin() condition,
    so the optimizer only clones the left subtree when it is safe to do so.

  3. AbstractTestQueries.java: Adds testJoinPrefilterSkippedForNonDeterministicExpressions
    which verifies that:

    • With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
    • With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 11, 2026

Reviewer's Guide

Adds a determinism-aware guard to the JoinPrefilter optimizer so it only clones scan/filter/project left subtrees when all predicates and projections are deterministic, and adds a regression test to verify behavior with TABLESAMPLE BERNOULLI vs deterministic joins.

Sequence diagram for applying JoinPrefilter with determinism guard

sequenceDiagram
    participant Optimizer as JoinPrefilter
    participant LeftPlan as PlanNode_left
    participant PlannerUtils
    participant Determinism as RowExpressionDeterminismEvaluator

    Optimizer->>LeftPlan: rewriteWith(this)
    activate LeftPlan
    LeftPlan-->>Optimizer: rewrittenLeft
    deactivate LeftPlan

    Optimizer->>PlannerUtils: isScanFilterProject(rewrittenLeft)
    PlannerUtils-->>Optimizer: boolean isScanFilterProject

    Optimizer->>PlannerUtils: isDeterministicScanFilterProject(rewrittenLeft, functionAndTypeManager)
    activate PlannerUtils
    PlannerUtils->>Determinism: RowExpressionDeterminismEvaluator(functionAndTypeManager)
    activate Determinism
    Determinism-->>PlannerUtils: instance

    loop scan_filter_project subtree
        PlannerUtils->>Determinism: isDeterministic(predicate_or_projection_expression)
        Determinism-->>PlannerUtils: boolean isDeterministic
    end

    PlannerUtils-->>Optimizer: boolean isDeterministicScanFilterProject
    deactivate Determinism
    deactivate PlannerUtils

    alt LEFT or INNER join and scan_filter_project and deterministic and has criteria
        Optimizer->>Optimizer: build bloom filter and SemiJoin
    else otherwise
        Optimizer->>Optimizer: skip JoinPrefilter transformation
    end
Loading

Class diagram for determinism guard in JoinPrefilter optimizer

classDiagram

class PlanNode
class TableScanNode
class FilterNode
class ProjectNode
class JoinNode

PlanNode <|-- TableScanNode
PlanNode <|-- FilterNode
PlanNode <|-- ProjectNode
PlanNode <|-- JoinNode

class PlannerUtils {
  +static boolean isScanFilterProject(PlanNode node)
  +static boolean isDeterministicScanFilterProject(PlanNode node, FunctionAndTypeManager functionAndTypeManager)
  -static boolean isDeterministicPlanSubtree(PlanNode node, DeterminismEvaluator determinismEvaluator)
}

class JoinPrefilter {
  -FunctionAndTypeManager functionAndTypeManager
  +PlanNode visitJoin(JoinNode node, RewriteContext context)
}

class DeterminismEvaluator {
  +boolean isDeterministic(RowExpression expression)
}

class RowExpressionDeterminismEvaluator {
  +RowExpressionDeterminismEvaluator(FunctionAndTypeManager functionAndTypeManager)
  +boolean isDeterministic(RowExpression expression)
}

class FunctionAndTypeManager
class RowExpression
class RewriteContext
class EquiJoinClause {
  +VariableReferenceExpression getLeft()
  +VariableReferenceExpression getRight()
}
class VariableReferenceExpression {
  +Type getType()
}
class Type

RowExpressionDeterminismEvaluator ..|> DeterminismEvaluator
PlannerUtils ..> DeterminismEvaluator
PlannerUtils ..> RowExpressionDeterminismEvaluator
JoinPrefilter ..> PlannerUtils
JoinPrefilter ..> FunctionAndTypeManager
JoinPrefilter ..> JoinNode
JoinNode ..> EquiJoinClause
EquiJoinClause ..> VariableReferenceExpression
VariableReferenceExpression ..> Type

FilterNode ..> RowExpression
ProjectNode ..> RowExpression
Loading

File-Level Changes

Change Details Files
Introduce a determinism check utility for scan-filter-project subtrees and use it in JoinPrefilter before cloning the left side of a join.
  • Add isDeterministicScanFilterProject(PlanNode, FunctionAndTypeManager) to compute determinism for scan/filter/project trees using RowExpressionDeterminismEvaluator
  • Implement private isDeterministicPlanSubtree helper that recursively validates TableScan, Filter (predicate), and Project (assignments) nodes and rejects other shapes
  • Update JoinPrefilter.visitJoin to additionally require a deterministic scan/filter/project left subtree before applying the SemiJoin-based prefiltering optimization
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlannerUtils.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/JoinPrefilter.java
Add a regression test to ensure JoinPrefilter is skipped for non-deterministic left-side expressions while still applied for deterministic joins.
  • Create testJoinPrefilterSkippedForNonDeterministicExpressions in AbstractTestQueries
  • Configure a session with JOIN_PREFILTER_BUILD_SIDE enabled
  • Assert that a BERNOULLI TABLESAMPLE join plan does not contain SemiJoin while an equivalent deterministic join plan does
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • isDeterministicScanFilterProject assumes the input is a scan-filter-project tree and returns false for any other node type; consider enforcing this precondition explicitly (e.g., via checkArgument(isScanFilterProject(node))) or handling unexpected node types more clearly to avoid silent behavior changes if the method is reused.
  • In JoinPrefilter, RowExpressionDeterminismEvaluator is effectively re-created for every call via isDeterministicScanFilterProject; consider constructing and caching a single evaluator in JoinPrefilter (or passing one in) to avoid repeated allocations on hot planning paths.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- `isDeterministicScanFilterProject` assumes the input is a scan-filter-project tree and returns `false` for any other node type; consider enforcing this precondition explicitly (e.g., via `checkArgument(isScanFilterProject(node))`) or handling unexpected node types more clearly to avoid silent behavior changes if the method is reused.
- In `JoinPrefilter`, `RowExpressionDeterminismEvaluator` is effectively re-created for every call via `isDeterministicScanFilterProject`; consider constructing and caching a single evaluator in `JoinPrefilter` (or passing one in) to avoid repeated allocations on hot planning paths.

## Individual Comments

### Comment 1
<location path="presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java" line_range="7871-7880" />
<code_context>
+        String testQuery = "SELECT count(*) from orders TABLESAMPLE BERNOULLI (50) join lineitem using(orderkey)";
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a complementary test where the TABLESAMPLE BERNOULLI is applied on the right side to ensure JoinPrefilter still applies as expected.

Since the regression only involves non-determinism on the left side and the guard only inspects the left scan-filter-project subtree, please also add a test with `TABLESAMPLE BERNOULLI` on the right side (e.g., `orders join lineitem TABLESAMPLE BERNOULLI(50) using(orderkey)`) and assert that a SemiJoin is still present. This will document the intended asymmetry and protect against future changes that might incorrectly apply the check to both sides and disable the optimization.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…ions (prestodb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…ions (prestodb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…odb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…odb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…odb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…odb#27312)

Summary:
Pull Request resolved: prestodb#27312

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…odb#27312)

Summary:
Pull Request resolved: prestodb#27312

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 12, 2026
…odb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
@adheer-araokar adheer-araokar changed the title [Relay][Presto] Guard JoinPrefilter against non-deterministic expressions fix: Guard JoinPrefilter against non-deterministic expressions Mar 13, 2026
@abhinavmuk04 abhinavmuk04 self-requested a review March 13, 2026 03:39
@meta-codesync meta-codesync bot changed the title fix: Guard JoinPrefilter against non-deterministic expressions fix: Guard JoinPrefilter against non-deterministic expressions (#27312) Mar 13, 2026
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 13, 2026
…odb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
adheer-araokar added a commit to adheer-araokar/presto that referenced this pull request Mar 13, 2026
…odb#27312)

Summary:
Pull Request resolved: prestodb#27312

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
…odb#27312)

Summary:

The JoinPrefilter optimizer clones the left side of a join to build a bloom filter
for pre-filtering the right side. When the left side contains non-deterministic
expressions (e.g., `rand() < 0.1` from `TABLESAMPLE BERNOULLI`), the two clones
produce different random samples, causing the join to require rows to be in BOTH
samples — effectively squaring the sampling rate (10% becomes 1%).

This diff adds a determinism guard to the JoinPrefilter optimizer:

1. **PlannerUtils.java**: Adds `isDeterministicScanFilterProject()` which recursively
   checks that all filter predicates and project assignments in a scan-filter-project
   subtree are deterministic, using `RowExpressionDeterminismEvaluator`.

2. **JoinPrefilter.java**: Adds the determinism check to the `visitJoin()` condition,
   so the optimizer only clones the left subtree when it is safe to do so.

3. **AbstractTestQueries.java**: Adds `testJoinPrefilterSkippedForNonDeterministicExpressions`
   which verifies that:
   - With TABLESAMPLE BERNOULLI (non-deterministic), JoinPrefilter does NOT produce a SemiJoin
   - With deterministic joins, JoinPrefilter still produces a SemiJoin as expected

Reviewed By: kaikalur

Differential Revision: D95575024
@singcha singcha merged commit 180beb1 into prestodb:master Mar 19, 2026
116 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants