Skip to content

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Jul 1, 2025

Description

In this NULL skew optimizer, we will add a new project node under the join node, which currently does not carry the stats equivalent node from HBO, hence lose the HBO information.
Previously I think it's fine, as the the project node adds a few more columns, i.e. new join keys which is of string type. However, in recent debug, I found that this will lead to use of inaccurate CBO stats. HBO stats may have the reported size larger than it should be, which is because of the new join keys added, however the row count is still accurate and the size is still not far away since the new join key is usually small.
In this PR I keep the stats equivalent node in HBO.

Motivation and Context

As in description

Impact

More accurate stats

Test Plan

Existing unit tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix randomize null join optimizer to keep HBO information for join input

@feilong-liu feilong-liu requested a review from a team as a code owner July 1, 2025 04:55
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jul 1, 2025
@feilong-liu feilong-liu marked this pull request as draft July 1, 2025 04:56
@feilong-liu feilong-liu marked this pull request as ready for review July 1, 2025 05:52

ProjectNode newLeft = new ProjectNode(rewrittenLeft.getSourceLocation(), planNodeIdAllocator.getNextId(), rewrittenLeft, leftAssignments.build(), LOCAL);
ProjectNode newRight = new ProjectNode(rewrittenRight.getSourceLocation(), planNodeIdAllocator.getNextId(), rewrittenRight, rightAssignments.build(), LOCAL);
ProjectNode newLeft = new ProjectNode(rewrittenLeft.getSourceLocation(), planNodeIdAllocator.getNextId(), joinNode.getLeft().getStatsEquivalentPlanNode(), rewrittenLeft, leftAssignments.build(), LOCAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can just do
Project newLeft = joinNode.getLeft().replaceChilren() and that will automatically set this

@feilong-liu feilong-liu merged commit 2220ac0 into prestodb:master Jul 5, 2025
116 of 118 checks passed
@feilong-liu feilong-liu deleted the null_skew_hbo branch July 5, 2025 20:45
@prestodb-ci prestodb-ci mentioned this pull request Jul 28, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants