-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46468] [SQL] Handle COUNT bug for EXISTS subqueries with Aggregate without grouping keys #44451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jchen5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the title be something like handle COUNT bug for subqueries with aggregate and no group-by?
It looks like your PR just has reverting your earlier count bug changes, was there another part that was missed?
updated |
jchen5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a legacy behavior flag for this change? Since it seems like the behavior for EXISTS has been wrong for a long time.
|
Actually, on second thought, we already have DECORRELATE_EXISTS_IN_SUBQUERY_LEGACY_INCORRECT_COUNT_HANDLING_ENABLED which should take care of that |
|
thanks, merging to master! |
What changes were proposed in this pull request?
As Aggregates with no grouping keys always return 1 row (can be NULL), an EXISTs over such subquery should always return true.
This reverts some changes done when we migrated EXISTS/IN to DecorrelateInnerQuery framework, in particular the static detection of potential count bug aggregates is removed (just having an empty grouping key should trigger the count bug treatment now; scalar subqueries still have extra checks that are evaluating the aggregate on an empty input). I suspect the same correctness problem was present in the legacy framework (added one test in the legacy section of exists-count-bug.sql)
Why are the changes needed?
Does this PR introduce any user-facing change?
No
How was this patch tested?
Query tests
Was this patch authored or co-authored using generative AI tooling?
No