Commit cfc0495
[SPARK-34581][SQL] Don't optimize out grouping expressions from aggregate expressions without aggregate function
### What changes were proposed in this pull request?
This PR adds a new rule `PullOutGroupingExpressions` to pull out complex grouping expressions to a `Project` node under an `Aggregate`. These expressions are then referenced in both grouping expressions and aggregate expressions without aggregate functions to ensure that optimization rules don't change the aggregate expressions to invalid ones that no longer refer to any grouping expressions.
### Why are the changes needed?
If aggregate expressions (without aggregate functions) in an `Aggregate` node are complex then the `Optimizer` can optimize out grouping expressions from them and so making aggregate expressions invalid.
Here is a simple example:
```
SELECT not(t.id IS NULL) , count(*)
FROM t
GROUP BY t.id IS NULL
```
In this case the `BooleanSimplification` rule does this:
```
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification ===
!Aggregate [isnull(id#222)], [NOT isnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L] Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L]
+- Project [value#219 AS id#222] +- Project [value#219 AS id#222]
+- LocalRelation [value#219] +- LocalRelation [value#219]
```
where `NOT isnull(id#222)` is optimized to `isnotnull(id#222)` and so it no longer refers to any grouping expression.
Before this PR:
```
== Optimized Logical Plan ==
Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#234, count(1) AS c#232L]
+- Project [value#219 AS id#222]
+- LocalRelation [value#219]
```
and running the query throws an error:
```
Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L]
java.lang.IllegalStateException: Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L]
```
After this PR:
```
== Optimized Logical Plan ==
Aggregate [_groupingexpression#233], [NOT _groupingexpression#233 AS (NOT (id IS NULL))#230, count(1) AS c#228L]
+- Project [isnull(value#219) AS _groupingexpression#233]
+- LocalRelation [value#219]
```
and the query works.
### Does this PR introduce _any_ user-facing change?
Yes, the query works.
### How was this patch tested?
Added new UT.
Closes #32396 from peter-toth/SPARK-34581-keep-grouping-expressions-2.
Authored-by: Peter Toth <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 6ce1b16 commit cfc0495
File tree
24 files changed
+239
-139
lines changed- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- expressions/aggregate
- optimizer
- planning
- test/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/test/resources
- sql-tests
- inputs
- results
- tpcds-plan-stability/approved-plans-v1_4
- q23a.sf100
- q23a
- q23b.sf100
- q23b
- q62.sf100
- q62
- q99.sf100
- q99
24 files changed
+239
-139
lines changedLines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
83 | 91 | | |
84 | 92 | | |
85 | 93 | | |
| |||
Lines changed: 1 addition & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | 29 | | |
39 | 30 | | |
40 | 31 | | |
| |||
Lines changed: 6 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
| 151 | + | |
151 | 152 | | |
152 | 153 | | |
153 | 154 | | |
| |||
267 | 268 | | |
268 | 269 | | |
269 | 270 | | |
270 | | - | |
| 271 | + | |
| 272 | + | |
271 | 273 | | |
272 | 274 | | |
273 | 275 | | |
| |||
524 | 526 | | |
525 | 527 | | |
526 | 528 | | |
527 | | - | |
| 529 | + | |
| 530 | + | |
528 | 531 | | |
529 | 532 | | |
530 | 533 | | |
531 | 534 | | |
532 | 535 | | |
533 | | - | |
| 536 | + | |
534 | 537 | | |
535 | 538 | | |
536 | 539 | | |
537 | 540 | | |
538 | 541 | | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
544 | 542 | | |
545 | 543 | | |
546 | 544 | | |
| |||
Lines changed: 79 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
Lines changed: 3 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
297 | 297 | | |
298 | 298 | | |
299 | 299 | | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
305 | 303 | | |
306 | 304 | | |
307 | 305 | | |
| |||
Lines changed: 3 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
| |||
57 | 59 | | |
58 | 60 | | |
59 | 61 | | |
60 | | - | |
| 62 | + | |
61 | 63 | | |
62 | 64 | | |
63 | 65 | | |
| |||
405 | 407 | | |
406 | 408 | | |
407 | 409 | | |
408 | | - | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | 410 | | |
417 | 411 | | |
418 | 412 | | |
| |||
Lines changed: 10 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
Lines changed: 23 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
642 | 642 | | |
643 | 643 | | |
644 | 644 | | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
Lines changed: 12 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
202 | | - | |
| 202 | + | |
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
207 | | - | |
| 206 | + | |
| 207 | + | |
208 | 208 | | |
209 | 209 | | |
210 | | - | |
| 210 | + | |
211 | 211 | | |
212 | 212 | | |
213 | | - | |
214 | | - | |
| 213 | + | |
| 214 | + | |
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
| |||
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
409 | | - | |
| 409 | + | |
410 | 410 | | |
411 | 411 | | |
412 | 412 | | |
413 | | - | |
414 | | - | |
| 413 | + | |
| 414 | + | |
415 | 415 | | |
416 | 416 | | |
417 | | - | |
| 417 | + | |
418 | 418 | | |
419 | 419 | | |
420 | | - | |
421 | | - | |
| 420 | + | |
| 421 | + | |
422 | 422 | | |
423 | 423 | | |
424 | 424 | | |
| |||
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
38 | | - | |
| 37 | + | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
180 | | - | |
181 | | - | |
| 180 | + | |
| 181 | + | |
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
| |||
0 commit comments