Commit 58e07e0
[SPARK-32940][SQL] Collect, first and last should be deterministic aggregate functions
### What changes were proposed in this pull request?
Collect, first and last have mistakenly been marked as non-deterministic. They are actually deterministic iff their child expression is deterministic.
For example collect was marked as non-deterministic in #14749. The reasoning was that its output depends on the actual order of input rows. Although it is correct that these aggregators depend on the order of input rows, it does not make them non-deterministic.
In `EliminateSorts` optimizer rule, there is a method `isOrderIrrelevantAggs`, that lists all aggregators that do not depend on their input row order. Collect, first and last are correctly not listed there.
An aggregator would be non-deterministic if its output for a group would depend on previous groups it has aggregated - I can't think of any practical examples of this kind of aggregator in Spark.
An analogous aggregator to these would be sum on float and double datatype - its result does depend on the order of its inputs, but is deterministic. Another similar aggregates are the `max_by` and `min_by` - deterministic functions, that can return different results when the order of rows changes.
### Why are the changes needed?
The optimizer rule `PushPredicateThroughNonJoin` can work in more cases.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
Closes #29810 from tanelk/SPARK-32940.
Lead-authored-by: [email protected] <[email protected]>
Co-authored-by: Tanel Kiis <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 3f3201a commit 58e07e0
File tree
9 files changed
+58
-24
lines changed- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- dsl
- expressions/aggregate
- optimizer
- test/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/test/scala/org/apache/spark/sql
9 files changed
+58
-24
lines changedLines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
218 | 218 | | |
219 | 219 | | |
220 | 220 | | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
221 | 226 | | |
222 | 227 | | |
223 | 228 | | |
| |||
Lines changed: 0 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | 65 | | |
69 | 66 | | |
70 | 67 | | |
| |||
Lines changed: 0 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | 64 | | |
68 | 65 | | |
69 | 66 | | |
| |||
Lines changed: 0 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | 46 | | |
51 | 47 | | |
52 | 48 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
423 | 423 | | |
424 | 424 | | |
425 | 425 | | |
| 426 | + | |
| 427 | + | |
426 | 428 | | |
427 | 429 | | |
428 | 430 | | |
| |||
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
42 | 46 | | |
43 | 47 | | |
44 | 48 | | |
45 | 49 | | |
46 | | - | |
| 50 | + | |
47 | 51 | | |
48 | 52 | | |
49 | 53 | | |
| |||
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
840 | 840 | | |
841 | 841 | | |
842 | 842 | | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
843 | 866 | | |
844 | 867 | | |
845 | 868 | | |
| |||
Lines changed: 3 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | 26 | | |
28 | 27 | | |
29 | 28 | | |
| |||
183 | 182 | | |
184 | 183 | | |
185 | 184 | | |
186 | | - | |
187 | | - | |
| 185 | + | |
| 186 | + | |
188 | 187 | | |
189 | | - | |
| 188 | + | |
190 | 189 | | |
191 | 190 | | |
192 | 191 | | |
| |||
Lines changed: 20 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
2790 | 2791 | | |
2791 | 2792 | | |
2792 | 2793 | | |
2793 | | - | |
2794 | | - | |
2795 | | - | |
2796 | | - | |
2797 | | - | |
2798 | | - | |
| 2794 | + | |
| 2795 | + | |
| 2796 | + | |
| 2797 | + | |
| 2798 | + | |
| 2799 | + | |
| 2800 | + | |
| 2801 | + | |
| 2802 | + | |
| 2803 | + | |
| 2804 | + | |
| 2805 | + | |
| 2806 | + | |
| 2807 | + | |
| 2808 | + | |
| 2809 | + | |
| 2810 | + | |
| 2811 | + | |
2799 | 2812 | | |
2800 | | - | |
2801 | | - | |
2802 | 2813 | | |
2803 | 2814 | | |
2804 | 2815 | | |
| |||
0 commit comments