Commit 7278bc7
[SPARK-50489][SQL][PYTHON] Fix self-join after
### What changes were proposed in this pull request?
Fix self-join after `applyInArrow`, the same issue of `applyInPandas` was fixed in #31429
### Why are the changes needed?
bug fix
before:
```
In [1]: import pyarrow as pa
In [2]: df = spark.createDataFrame([(1, 1)], ("k", "v"))
In [3]: def arrow_func(key, table):
...: return pa.Table.from_pydict({"x": [2], "y": [2]})
...:
In [4]: df2 = df.groupby("k").applyInArrow(arrow_func, schema="x long, y long")
In [5]: df2.show()
24/12/04 17:47:43 WARN CheckAllocator: More than one DefaultAllocationManager on classpath. Choosing first found
+---+---+
| x| y|
+---+---+
| 2| 2|
+---+---+
In [6]: df2.join(df2)
...
Failure when resolving conflicting references in Join:
'Join Inner
:- FlatMapGroupsInArrow [k#0L], arrow_func(k#0L, v#1L)#2, [x#3L, y#4L]
: +- Project [k#0L, k#0L, v#1L]
: +- LogicalRDD [k#0L, v#1L], false
+- FlatMapGroupsInArrow [k#12L], arrow_func(k#12L, v#13L)#2, [x#3L, y#4L]
+- Project [k#12L, k#12L, v#13L]
+- LogicalRDD [k#12L, v#13L], false
Conflicting attributes: "x", "y". SQLSTATE: XX000
at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at org.apache.spark.SparkException$.internalError(SparkException.scala:79)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:798)
```
after:
```
In [6]: df2.join(df2)
Out[6]: DataFrame[x: bigint, y: bigint, x: bigint, y: bigint]
In [7]: df2.join(df2).show()
+---+---+---+---+
| x| y| x| y|
+---+---+---+---+
| 2| 2| 2| 2|
+---+---+---+---+
```
### Does this PR introduce _any_ user-facing change?
bug fix
### How was this patch tested?
added tests
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #49056 from zhengruifeng/fix_arrow_join.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>applyInArrow
1 parent fe904e6 commit 7278bc7
File tree
3 files changed
+34
-0
lines changed- python/pyspark/sql/tests
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis
3 files changed
+34
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
302 | 312 | | |
303 | 313 | | |
304 | 314 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
255 | 255 | | |
256 | 256 | | |
257 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
258 | 268 | | |
259 | 269 | | |
260 | 270 | | |
| |||
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
135 | 142 | | |
136 | 143 | | |
137 | 144 | | |
138 | 145 | | |
139 | 146 | | |
140 | 147 | | |
141 | 148 | | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
142 | 156 | | |
143 | 157 | | |
144 | 158 | | |
| |||
0 commit comments