Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: column support for array_append, array_prepend, array_position and array_positions #6805

Merged
merged 5 commits into from
Jul 5, 2023
Merged

feat: column support for array_append, array_prepend, array_position and array_positions #6805

merged 5 commits into from
Jul 5, 2023

Conversation

izveigor
Copy link
Contributor

@izveigor izveigor commented Jun 30, 2023

Which issue does this PR close?

Part of #6804

Rationale for this change

This is the first PR for solving the column issue.

What changes are included in this PR?

Four array functions now start to support column data:
array_append,
array_prepend,
array_position,
array_positions

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@izveigor izveigor marked this pull request as draft June 30, 2023 09:22
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jun 30, 2023
@izveigor izveigor changed the title feat: column support for array functions feat: column support for array_append, array_prepend, array_position and array_positions Jul 3, 2023
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Jul 3, 2023
}
None => {
return Err(DataFusionError::Internal(
"initial position must not be null".to_string(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL does not support NULL like the start position. Should we do that or is it better to translate NULL to0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think following postgres seems reasonable to me

Some(arr) => {
let child_array = downcast_arg!(arr, $ARRAY_TYPE);
values = downcast_arg!(
compute::concat(&[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow-rs does not support into_builder method for GenericListArray. Is the concat method better than builder. Or is there a great third option? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When concatenating two arrays I don't think there is any difference.

however, you can see in the concat implementation it uses MutableArrayData: https://docs.rs/arrow-select/43.0.0/src/arrow_select/concat.rs.html#86-91 -- maybe that is an approach to pursue?

@tustvold do you know of any reason we can't add a info_builder method for GenericListArray? If not, I'll file a ticket request and see if anyone in the community wants to try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no major reason why we couldn't support into_builder for ListArray, however, I should highlight this only works if there are no other references. I've not read the rest of the PR but this may not be something we can rely on here. MutableArrayData seems like the best option here if there is no selection kernel (concat, take, interleave) that suits

select make_array(NULL), make_array(NULL, NULL, NULL), make_array(make_array(NULL, NULL), make_array(NULL, NULL));
----
[] [] [[], []]

# make_array with columns #1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All outputs match PostgreSQL engine results.
See the examples: #6804

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can automatically ensure they stay in sync by using the postgres compatability mode if you wanted: https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests#running-tests-postgres-compatibility

@izveigor izveigor marked this pull request as ready for review July 3, 2023 13:20
@izveigor
Copy link
Contributor Author

izveigor commented Jul 3, 2023

Ready for the review @alamb
Problem solution will be divided into several independent PR so as not to cause the same problem that you mentioned in #6384

@alamb
Copy link
Contributor

alamb commented Jul 3, 2023

This is on my list to review, but I might not have time for it until later in the week (Wednesday) -- I am somewhat occupied with #6800 at the moment

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor -- I went through this PR and it all looks great. I had a few minor comments but nothing I think should stop it from merging

select make_array(NULL), make_array(NULL, NULL, NULL), make_array(make_array(NULL, NULL), make_array(NULL, NULL));
----
[] [] [[], []]

# make_array with columns #1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can automatically ensure they stay in sync by using the postgres compatability mode if you wanted: https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests#running-tests-postgres-compatibility

BuiltinScalarFunction::ArrayPositions => {
Ok(List(Arc::new(Field::new("item", UInt8, true))))
Ok(List(Arc::new(Field::new("item", UInt64, true))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Some(arr) => {
let child_array = downcast_arg!(arr, $ARRAY_TYPE);
values = downcast_arg!(
compute::concat(&[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When concatenating two arrays I don't think there is any difference.

however, you can see in the concat implementation it uses MutableArrayData: https://docs.rs/arrow-select/43.0.0/src/arrow_select/concat.rs.html#86-91 -- maybe that is an approach to pursue?

@tustvold do you know of any reason we can't add a info_builder method for GenericListArray? If not, I'll file a ticket request and see if anyone in the community wants to try.

}
None => {
return Err(DataFusionError::Internal(
"initial position must not be null".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think following postgres seems reasonable to me

}
}
None => {
return Err(DataFusionError::Internal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error can be generated by bad user input (and not a bug in DataFusion) so I don't think it should be an internal error:

Suggested change
return Err(DataFusionError::Internal(
return Err(DataFusionError::Execution(

@alamb alamb merged commit 9a27d84 into apache:main Jul 5, 2023
@alamb
Copy link
Contributor

alamb commented Jul 5, 2023

Thanks @izveigor

@izveigor izveigor mentioned this pull request Jul 6, 2023
11 tasks
alamb pushed a commit to alamb/datafusion that referenced this pull request Jul 6, 2023
…tion` and `array_positions` (apache#6805)

* test: sqllogictests with columns for array_append, array_prepend, array_position and array_positions

* feat: column support for array_append and array_prepend

* feat: column support for array_position and array_positions

* fix: error type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants