Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive unnest misbehave if used together with another unnest on different column #11689

Open
duongcongtoai opened this issue Jul 28, 2024 · 2 comments · May be fixed by #11577
Open

Recursive unnest misbehave if used together with another unnest on different column #11689

duongcongtoai opened this issue Jul 28, 2024 · 2 comments · May be fixed by #11577
Labels
bug Something isn't working

Comments

@duongcongtoai
Copy link
Contributor

duongcongtoai commented Jul 28, 2024

Describe the bug

During the implementation of this issue I found another problem

Given this slt

query II
select unnest([1,2,3]), unnest(unnest([[1,2,3]]));

The output from Datafusion is

1 1
1 2
1 3

To Reproduce

No response

Expected behavior

Datafusion returns correct result set

1 1
2 2
3 3

Example in Duckdb

D select unnest([1,2,3]), unnest([[1,2,3]],recursive:=true);
┌──────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────┐
│ unnest(main.list_value(1, 2, 3)) │ unnest(main.list_value(main.list_value(1, 2, 3)), "recursive" := CAST('t' AS BOOLEAN)) │
│              int32               │                                         int32                                          │
├──────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
│                                1 │                                                                                      1 │
│                                2 │                                                                                      2 │
│                                3 │                                                                                      3 │
└──────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────┘

Additional context

No response

@duongcongtoai duongcongtoai added the bug Something isn't working label Jul 28, 2024
@duongcongtoai
Copy link
Contributor Author

duongcongtoai commented Jul 28, 2024

This is maybe because of the logical plan generated as

logical_plan
01)Unnest: lists[unnest(unnest(make_array(make_array(Int64(1),Int64(2),Int64(3)))))] structs[]
02)--Projection: unnest(make_array(Int64(1),Int64(2),Int64(3))), unnest(make_array(make_array(Int64(1),Int64(2),Int64(3)))) AS unnest(unnest(make_array(make_array(Int64(1),Int64(2),Int64(3)))))
03)----Unnest: lists[unnest(make_array(Int64(1),Int64(2),Int64(3))), unnest(make_array(make_array(Int64(1),Int64(2),Int64(3))))] structs[]
04)------Projection: List([1, 2, 3]) AS unnest(make_array(Int64(1),Int64(2),Int64(3))), List([[1, 2, 3]]) AS unnest(make_array(make_array(Int64(1),Int64(2),Int64(3))))

While the correct plan may need to look like

01)--Projection: unnest(make_array(Int64(1),Int64(2),Int64(3))), unnest_depth_2(make_array(make_array(Int64(1),Int64(2),Int64(3)))) AS unnest(unnest(make_array(make_array(Int64(1),Int64(2),Int64(3)))))
02)----Unnest: lists[unnest(make_array(Int64(1),Int64(2),Int64(3))), unnest_depth_2(make_array(make_array(Int64(1),Int64(2),Int64(3))))] structs[]
03)------Projection: List([1, 2, 3]) AS unnest(make_array(Int64(1),Int64(2),Int64(3))), List([[1, 2, 3]]) AS unnest(make_array(make_array(Int64(1),Int64(2),Int64(3))))

Meaning all the necessary unnest will have to happen inside one logical node

This bring the needs for the following items:

  1. unnest logical/physical plan has to be aware of the depth of unnest
  2. try_process_unnest function has to detect consecutive unnests in the query plan, and determine the depth for each unnest operation on each columns

For the second items, we have to be aware of the expr that has recursive unnest, but they are not consecutive, for example this expr:

select unnest(unnest(unnest(struct_arr_col)['col0'])), unnest([1,2,3]);

where struct_arr_col has type List<Struct<List<List>>> for example

@duongcongtoai
Copy link
Contributor Author

take

@duongcongtoai duongcongtoai changed the title Recursive unnest misbehave if used together with another unnest on different column Recursive unnest misbehave if used together with another unnest on different column Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant