Commit 8637205
[SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
### What changes were proposed in this pull request?
Resolve duplicate attributes for `FlatMapCoGroupsInPandas`.
### Why are the changes needed?
When performing self-join on top of `FlatMapCoGroupsInPandas`, analysis can fail because of conflicting attributes. For example,
```scala
df = spark.createDataFrame([(1, 1)], ("column", "value"))
row = df.groupby("ColUmn").cogroup(
df.groupby("COLUMN")
).applyInPandas(lambda r, l: r + l, "column long, value long")
row.join(row).show()
```
error:
```scala
...
Conflicting attributes: column#163321L,value#163322L
;;
’Join Inner
:- FlatMapCoGroupsInPandas [ColUmn#163312L], [COLUMN#163312L], <lambda>(column#163312L, value#163313L, column#163312L, value#163313L), [column#163321L, value#163322L]
: :- Project [ColUmn#163312L, column#163312L, value#163313L]
: : +- LogicalRDD [column#163312L, value#163313L], false
: +- Project [COLUMN#163312L, column#163312L, value#163313L]
: +- LogicalRDD [column#163312L, value#163313L], false
+- FlatMapCoGroupsInPandas [ColUmn#163312L], [COLUMN#163312L], <lambda>(column#163312L, value#163313L, column#163312L, value#163313L), [column#163321L, value#163322L]
:- Project [ColUmn#163312L, column#163312L, value#163313L]
: +- LogicalRDD [column#163312L, value#163313L], false
+- Project [COLUMN#163312L, column#163312L, value#163313L]
+- LogicalRDD [column#163312L, value#163313L], false
...
```
### Does this PR introduce _any_ user-facing change?
yes, the query like the above example won't fail.
### How was this patch tested?
Adde unit tests.
Closes #31429 from Ngone51/fix-conflcting-attrs-of-FlatMapCoGroupsInPandas.
Lead-authored-by: yi.wu <[email protected]>
Co-authored-by: wuyi <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit e9362c2)
Signed-off-by: HyukjinKwon <[email protected]>1 parent a8e8ff6 commit 8637205
File tree
4 files changed
+70
-0
lines changed- python/pyspark/sql/tests
- sql/catalyst/src
- main/scala/org/apache/spark/sql/catalyst/analysis
- test/scala/org/apache/spark/sql/catalyst/analysis
4 files changed
+70
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
209 | 209 | | |
210 | 210 | | |
211 | 211 | | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
212 | 224 | | |
213 | 225 | | |
214 | 226 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
120 | 128 | | |
121 | 129 | | |
122 | 130 | | |
| |||
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1198 | 1198 | | |
1199 | 1199 | | |
1200 | 1200 | | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
1201 | 1209 | | |
1202 | 1210 | | |
1203 | 1211 | | |
| |||
Lines changed: 42 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
610 | 610 | | |
611 | 611 | | |
612 | 612 | | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
613 | 655 | | |
614 | 656 | | |
615 | 657 | | |
| |||
0 commit comments