Skip to content

Conversation

@harris233
Copy link
Contributor

What changes were proposed in this pull request?

Trying to resolve having agg exprs in resolveExprsWithAggregate function.

Why are the changes needed?

We identified that using CUBE with HAVING (aggregate conditions) may result in data quality problem.
See the demo below:
table:
CREATE TABLE table1(product string, amount bigint,region string) using csv
INSERT INTO table1 " + "VALUES('a', 100, 'east')
INSERT INTO table1 " + "VALUES('b', 200, 'east')
INSERT INTO table1 " + "VALUES('a', 150, 'west')
INSERT INTO table1 " + "VALUES('b', 250, 'west')
INSERT INTO table1 " + "VALUES('a', 120, 'east')

sql:
select product, region, sum(amount) as s from table1 group by product, region with cube having count(product) > 2 order by s desc

result:

  1. spark4.0:
    [a,null,370]

  2. spark3.1/Trino/StarRocks:
    [null,null,820]
    [null,east,420]
    [a,null,370]

plan:

  1. spark4.0
image
  1. spark3.1
image

Discrepancy in Results Between Spark 4.0 and Lower Versions (Spark 3.1) or Other Compute Engines (Trino/StarRocks)

The difference in results stems from execution plan variations, specifically regarding how aggregate functions handle parameter references in count operations from the having expression. When the count function references a grouping key, it may lead to unexpected results.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT added.

Was this patch authored or co-authored using generative AI tooling?

No

@peter-toth
Copy link
Contributor

@cloud-fan, this indeed seems like a correctness issue and regression introduced in 3.2.

I opened an alternative fix to avoid calling executeSameContext: #51820

peter-toth added a commit that referenced this pull request Aug 5, 2025
### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit to peter-toth/spark that referenced this pull request Aug 5, 2025
…uses

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

Fix a correctness isue described in apache#51810.

Yes, it fixes a correctness issue.

Added new UT from apache#51810.

No.

Closes apache#51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit to peter-toth/spark that referenced this pull request Aug 5, 2025
…uses

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

Fix a correctness isue described in apache#51810.

Yes, it fixes a correctness issue.

Added new UT from apache#51810.

No.

Closes apache#51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit that referenced this pull request Aug 6, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51854 from peter-toth/SPARK-53094-fix-cube-having-4.0.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit that referenced this pull request Aug 6, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51855 from peter-toth/SPARK-53094-fix-cube-having-3.5.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
@peter-toth
Copy link
Contributor

Thanks @harris233 for reporting the issue. I merged the alternative fix to 3.5.7, 4.0.1 and 4.1.0.

@dongjoon-hyun
Copy link
Member

Thanks @harris233 for reporting the issue. I merged the alternative fix to 3.5.7, 4.0.1 and 4.1.0.

Given the above status, I close this PR.

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in apache#51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from apache#51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51854 from peter-toth/SPARK-53094-fix-cube-having-4.0.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants