Skip to content

Return unknown equality estimate when NDV and range are unknown#29157

Merged
raunaqmorarka merged 1 commit intotrinodb:masterfrom
raunaqmorarka:raunaq/fix-bad-estimate
Apr 17, 2026
Merged

Return unknown equality estimate when NDV and range are unknown#29157
raunaqmorarka merged 1 commit intotrinodb:masterfrom
raunaqmorarka:raunaq/fix-bad-estimate

Conversation

@raunaqmorarka
Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka commented Apr 17, 2026

Description

On a column with unknown NDV and an unbounded range, StatisticRange.overlapPercentWith falls back to the infinite-to-infinite 0.5 heuristic, which is meant for range overlap, not point equality. It yielded 0.5 * non-null rows per equality, causing an IN list to saturate at the full non-null row count and $not(IN) to subtract to 0.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Section
* Fix suboptimal join ordering that could cause excessive memory usage for queries on columns with unknown statistics. ({issue}`29157`)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1008e6699

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread core/trino-main/src/main/java/io/trino/cost/ComparisonStatsCalculator.java Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts Trino’s filter selectivity estimation to avoid applying StatisticRange’s infinite-to-infinite overlap heuristic (0.5) to point-equality-style predicates when column statistics are too incomplete (unknown NDV + unbounded/unknown range), preventing pathological row-count outcomes (e.g., NOT IN collapsing to 0).

Changes:

  • Add a guard in equality-to-literal estimation to return an unknown estimate when NDV is unknown and the column range is unbounded/unknown.
  • Add a regression test ensuring NOT IN over such a column produces an unknown row-count estimate.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
core/trino-main/src/main/java/io/trino/cost/ComparisonStatsCalculator.java Introduces an early-return to avoid using infinite-range overlap heuristics for equality when NDV/range are unknown.
core/trino-main/src/test/java/io/trino/cost/TestFilterStatsCalculator.java Adds a regression test covering NOT IN on a VARCHAR column with unknown NDV and unbounded range.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/trino-main/src/main/java/io/trino/cost/ComparisonStatsCalculator.java Outdated
On a column with unknown NDV and an unbounded range,
StatisticRange.overlapPercentWith falls back to the infinite-to-infinite
0.5 heuristic, which is meant for range overlap, not point equality.
It yielded 0.5 * non-null rows per equality, causing an IN list to
saturate at the full non-null row count and $not(IN) to subtract to 0.
@raunaqmorarka raunaqmorarka force-pushed the raunaq/fix-bad-estimate branch from e1008e6 to 90f1a9b Compare April 17, 2026 16:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@raunaqmorarka raunaqmorarka merged commit 2444ab6 into trinodb:master Apr 17, 2026
102 checks passed
@raunaqmorarka raunaqmorarka deleted the raunaq/fix-bad-estimate branch April 17, 2026 18:08
@ebyhr ebyhr mentioned this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants