Commit 23a19e6
committed
[SPARK-52905][PYTHON] Arrow UDF for window
### What changes were proposed in this pull request?
Arrow UDF for window
### Why are the changes needed?
to make Arrow UDF support window operation
### Does this PR introduce _any_ user-facing change?
Not, yet. Will make Arrow UDF public soon
```py
In [1]: from typing import Iterator, Tuple
...: import pyarrow as pa
...: from pyspark.sql import Window
...: from pyspark.sql import functions as sf
...: from pyspark.sql.pandas.functions import arrow_udf
...:
...: import pandas as pd
...: from pyspark.sql.functions import pandas_udf
...: from pyspark.sql import Window
...:
...: df = spark.createDataFrame([(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v"))
...:
...: w = Window.partitionBy('id').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
...:
...:
In [2]: arrow_udf("double")
...: def arrow_mean_udf(v: pa.Array) -> float:
...: assert isinstance(v, pa.Array), str(type(v))
...: return pa.compute.mean(v)
...:
...: # df.select(arrow_mean_udf(df['v'])).show()
...: # df.groupby("id").agg(arrow_mean_udf('v')).show()
...:
...: df.withColumn('mean_v', arrow_mean_udf(df['v']).over(w)).show()
...:
...:
+---+----+------+
| id| v|mean_v|
+---+----+------+
| 1| 1.0| 1.5|
| 1| 2.0| 1.5|
| 2| 3.0| 6.0|
| 2| 5.0| 6.0|
| 2|10.0| 6.0|
+---+----+------+
```
### How was this patch tested?
New tests
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #51593 from zhengruifeng/arrow_udf_win.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>1 parent f345634 commit 23a19e6
File tree
11 files changed
+686
-9
lines changed- core/src/main/scala/org/apache/spark/api/python
- dev/sparktestsupport
- python/pyspark
- sql
- pandas/_typing
- tests/arrow
- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions
- core/src/main/scala/org/apache/spark/sql/execution
- python
11 files changed
+686
-9
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| |||
103 | 104 | | |
104 | 105 | | |
105 | 106 | | |
| 107 | + | |
106 | 108 | | |
107 | 109 | | |
108 | 110 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
545 | 545 | | |
546 | 546 | | |
547 | 547 | | |
| 548 | + | |
548 | 549 | | |
549 | 550 | | |
550 | 551 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
| |||
0 commit comments