You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When attempting to call show() on a DataFrame that contains a built in window function on a column that has struct elements, it produces the error Compute error: concat requires input of at least one array. However other functions such as count() do not have issues. I have less experience with DataFusion, so I just expected count() to do a full evaluation like it does in pyspark, so it's possible that my assumption is incorrect in that having any bearing on this error.
To Reproduce
This minimal example can reproduce the window function working properly on a simple element type and failing with a very simple struct.
import pyarrow as pa
from datafusion import SessionContext
import datafusion.functions as F
# taken from datafusion/tests/test_dataframe.py
def struct_df():
ctx = SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pa.RecordBatch.from_arrays(
[pa.array([{"c": 1}, {"c": 2}, {"c": 3}]), pa.array([4, 5, 6])],
names=["a", "b"],
)
return ctx.create_dataframe([[batch]])
df = struct_df()
df.show()
df.select(F.col("a"), F.col("b"), F.window("lag", [F.col("b")]).alias("lag_b")).show()
print("Calling count on lag a: ", df.select(F.col("a"), F.col("b"), F.window("lag", [F.col("a")]).alias("lag_a")).count())
df.select(F.col("a"), F.col("b"), F.window("lag", [F.col("a")]).alias("lag_a")).show()
Produces the following output:
DataFrame()
+--------+---+
| a | b |
+--------+---+
| {c: 1} | 4 |
| {c: 2} | 5 |
| {c: 3} | 6 |
+--------+---+
DataFrame()
+--------+---+-------+
| a | b | lag_b |
+--------+---+-------+
| {c: 1} | 4 | |
| {c: 2} | 5 | 4 |
| {c: 3} | 6 | 5 |
+--------+---+-------+
Calling count on lag a: 3
Traceback (most recent call last):
File "/Users/tsaucer/src/arrow-datafusion-python/example_lag_struct.py", line 25, in <module>
df.select(F.col("a"), F.col("b"), F.window("lag", [F.col("a")]).alias("lag_a")).show()
Exception: Arrow error: Compute error: concat requires input of at least one array
Expected behavior
Calling show() on a window function with a struct column type should operate similar to simple column types.
Additional context
I'm willing to work on this myself, but I'm not familiar with the internals of the plan execution. I've looked around myself to see if I can find anything obvious, but nothing is jumping out at me. If you can provide any directions or pointers, I would appreciate it.
The text was updated successfully, but these errors were encountered:
Describe the bug
When attempting to call
show()
on a DataFrame that contains a built in window function on a column that has struct elements, it produces the errorCompute error: concat requires input of at least one array
. However other functions such ascount()
do not have issues. I have less experience with DataFusion, so I just expectedcount()
to do a full evaluation like it does in pyspark, so it's possible that my assumption is incorrect in that having any bearing on this error.To Reproduce
This minimal example can reproduce the window function working properly on a simple element type and failing with a very simple struct.
Produces the following output:
In searching the web there was a similar error thrown that this old MR resolved in sort operations: https://github.com/apache/arrow/pull/9275/files#diff-3ee8e6ac2472badc7bb448c360f56ed60f06a787d1f45ea589d9e213eaf2ae82
Expected behavior
Calling
show()
on a window function with a struct column type should operate similar to simple column types.Additional context
I'm willing to work on this myself, but I'm not familiar with the internals of the plan execution. I've looked around myself to see if I can find anything obvious, but nothing is jumping out at me. If you can provide any directions or pointers, I would appreciate it.
The text was updated successfully, but these errors were encountered: