Skip to content

Conversation

@agubichev
Copy link
Contributor

@agubichev agubichev commented Dec 21, 2023

What changes were proposed in this pull request?

As Aggregates with no grouping keys always return 1 row (can be NULL), an EXISTs over such subquery should always return true.
This reverts some changes done when we migrated EXISTS/IN to DecorrelateInnerQuery framework, in particular the static detection of potential count bug aggregates is removed (just having an empty grouping key should trigger the count bug treatment now; scalar subqueries still have extra checks that are evaluating the aggregate on an empty input). I suspect the same correctness problem was present in the legacy framework (added one test in the legacy section of exists-count-bug.sql)

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Query tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Dec 21, 2023
@agubichev agubichev changed the title Handle COUNT bug for EXISTS/IN subqueries Handle COUNT bug for EXISTS subqueries Dec 21, 2023
@agubichev
Copy link
Contributor Author

@jchen5

@agubichev agubichev changed the title Handle COUNT bug for EXISTS subqueries [SPARK-46468] [SQL] Handle COUNT bug for EXISTS subqueries Dec 21, 2023
Copy link
Contributor

@jchen5 jchen5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the title be something like handle COUNT bug for subqueries with aggregate and no group-by?

It looks like your PR just has reverting your earlier count bug changes, was there another part that was missed?

@agubichev agubichev changed the title [SPARK-46468] [SQL] Handle COUNT bug for EXISTS subqueries [SPARK-46468] [SQL] Handle COUNT bug for EXISTS subqueries with Aggregate without grouping keys Dec 21, 2023
@agubichev
Copy link
Contributor Author

Should the title be something like handle COUNT bug for subqueries with aggregate and no group-by?

It looks like your PR just has reverting your earlier count bug changes, was there another part that was missed?

updated

Copy link
Contributor

@jchen5 jchen5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a legacy behavior flag for this change? Since it seems like the behavior for EXISTS has been wrong for a long time.

@jchen5
Copy link
Contributor

jchen5 commented Dec 21, 2023

Actually, on second thought, we already have DECORRELATE_EXISTS_IN_SUBQUERY_LEGACY_INCORRECT_COUNT_HANDLING_ENABLED which should take care of that

@agubichev
Copy link
Contributor Author

@cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3432fd8 Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants