-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: column support for array_append
, array_prepend
, array_position
and array_positions
#6805
Conversation
array_append
, array_prepend
, array_position
and array_positions
} | ||
None => { | ||
return Err(DataFusionError::Internal( | ||
"initial position must not be null".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostgreSQL does not support NULL
like the start position. Should we do that or is it better to translate NULL
to0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think following postgres seems reasonable to me
Some(arr) => { | ||
let child_array = downcast_arg!(arr, $ARRAY_TYPE); | ||
values = downcast_arg!( | ||
compute::concat(&[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow-rs
does not support into_builder
method for GenericListArray
. Is the concat
method better than builder
. Or is there a great third option? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When concatenating two arrays I don't think there is any difference.
however, you can see in the concat
implementation it uses MutableArrayData
: https://docs.rs/arrow-select/43.0.0/src/arrow_select/concat.rs.html#86-91 -- maybe that is an approach to pursue?
@tustvold do you know of any reason we can't add a info_builder
method for GenericListArray
? If not, I'll file a ticket request and see if anyone in the community wants to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no major reason why we couldn't support into_builder for ListArray, however, I should highlight this only works if there are no other references. I've not read the rest of the PR but this may not be something we can rely on here. MutableArrayData seems like the best option here if there is no selection kernel (concat, take, interleave) that suits
select make_array(NULL), make_array(NULL, NULL, NULL), make_array(make_array(NULL, NULL), make_array(NULL, NULL)); | ||
---- | ||
[] [] [[], []] | ||
|
||
# make_array with columns #1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All outputs match PostgreSQL engine results.
See the examples: #6804
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can automatically ensure they stay in sync by using the postgres compatability mode if you wanted: https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests#running-tests-postgres-compatibility
This is on my list to review, but I might not have time for it until later in the week (Wednesday) -- I am somewhat occupied with #6800 at the moment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @izveigor -- I went through this PR and it all looks great. I had a few minor comments but nothing I think should stop it from merging
select make_array(NULL), make_array(NULL, NULL, NULL), make_array(make_array(NULL, NULL), make_array(NULL, NULL)); | ||
---- | ||
[] [] [[], []] | ||
|
||
# make_array with columns #1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can automatically ensure they stay in sync by using the postgres compatability mode if you wanted: https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests#running-tests-postgres-compatibility
BuiltinScalarFunction::ArrayPositions => { | ||
Ok(List(Arc::new(Field::new("item", UInt8, true)))) | ||
Ok(List(Arc::new(Field::new("item", UInt64, true)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Some(arr) => { | ||
let child_array = downcast_arg!(arr, $ARRAY_TYPE); | ||
values = downcast_arg!( | ||
compute::concat(&[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When concatenating two arrays I don't think there is any difference.
however, you can see in the concat
implementation it uses MutableArrayData
: https://docs.rs/arrow-select/43.0.0/src/arrow_select/concat.rs.html#86-91 -- maybe that is an approach to pursue?
@tustvold do you know of any reason we can't add a info_builder
method for GenericListArray
? If not, I'll file a ticket request and see if anyone in the community wants to try.
} | ||
None => { | ||
return Err(DataFusionError::Internal( | ||
"initial position must not be null".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think following postgres seems reasonable to me
} | ||
} | ||
None => { | ||
return Err(DataFusionError::Internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error can be generated by bad user input (and not a bug in DataFusion) so I don't think it should be an internal error:
return Err(DataFusionError::Internal( | |
return Err(DataFusionError::Execution( |
Thanks @izveigor |
…tion` and `array_positions` (apache#6805) * test: sqllogictests with columns for array_append, array_prepend, array_position and array_positions * feat: column support for array_append and array_prepend * feat: column support for array_position and array_positions * fix: error type
Which issue does this PR close?
Part of #6804
Rationale for this change
This is the first PR for solving the column issue.
What changes are included in this PR?
Four array functions now start to support column data:
array_append
,array_prepend
,array_position
,array_positions
Are these changes tested?
Yes
Are there any user-facing changes?
Yes