-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert list array and non-list array to scalars #7862
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @jayzhan211 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211
e7fd4ca
to
c9f4ea1
Compare
Summary of changes
|
I think this is not waiting on any more feedback |
4b780ea
to
49a671f
Compare
542226d
to
6b56441
Compare
wait on #8253 |
d17a682
to
29ac936
Compare
wait on #8439 |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
29ac936
to
aee4eff
Compare
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@alamb I think this is ready to go, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is looking quite close I think
datafusion/common/src/scalar.rs
Outdated
/// Retrieve ScalarValue for each row in `array` | ||
/// Retrieve `ScalarValue` for each row in `array` | ||
/// | ||
/// Convert `ListArray` to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, second `Vec` is for elements in the list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 It might also help to explain why we need two different signatures. It took me a while to grok things too
/// Convert `ListArray` to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, second `Vec` is for elements in the list | |
/// Convert `ListArray` into a 2 dimensional to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, | |
/// second `Vec` is for elements in the list. | |
/// | |
/// See [`Self::convert_non_list_array_to_scalars`] for converting non Lists | |
/// | |
/// This method is an optimization to unwrap nested ListArrays to nested Rust structures without | |
/// converting them twice |
/// Retrieve `ScalarValue` for each row in `array` | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems left over
/// Retrieve `ScalarValue` for each row in `array` | |
/// |
@@ -2067,30 +2071,78 @@ impl ScalarValue { | |||
/// | |||
/// assert_eq!(scalar_vec, expected); | |||
/// ``` | |||
pub fn convert_array_to_scalar_vec(array: &dyn Array) -> Result<Vec<Vec<Self>>> { | |||
let mut scalars = Vec::with_capacity(array.len()); | |||
pub fn convert_list_array_to_scalar_vec<O: OffsetSizeTrait>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be easier to use if it didn't mix generic and non generic code
Maybe something like
pub fn convert_list_array_to_scalar_vec<O: OffsetSizeTrait>( | |
pub fn convert_list_array_to_scalar_vec( | |
array: &dyn Array, | |
) -> Result<Vec<Vec<Self>>> { | |
if let Some(arr) = array.as_list_opt::<i32> { | |
Self::convert_list_array_to_scalar_vec_internal(arr) | |
} else if let Some(arr) = array.as_list_opt::<64> { | |
Self::convert_list_array_to_scalar_vec_internal(arr) | |
} else { | |
_internal_err!("Expected GenericListArray but found: {array:?}") | |
} | |
} |
And then internally pass in the cast Array by changing
fn convert_list_array_to_scalar_vec_internal<O: OffsetSizeTrait>(
array: &dyn Array,
) -> Result<Vec<Vec<Self>>> {
to
fn convert_list_array_to_scalar_vec_internal<O: OffsetSizeTrait>(
array: &GenericListArray<O>,
) -> Result<Vec<Vec<Self>>> {
Signed-off-by: jayzhan211 <[email protected]>
(BTW sorry for being so nit picky on this PR -- I think |
This comment is a bit too late... And, this API is actually not breaking the previous one Anyway, this change is just improvement, not fixing a critical error, so I will close it. |
I am sorry -- I feel really bad that we can't find more review bandwidth to match your contributions @jayzhan211 -- your efforts so far have been great and are really appreciated. |
Which issue does this PR close?
Closes #.
Rationale for this change
convert_array_to_scalar_vec
can convert well for list array, but we will getVec<Vec<ScalarValue>>
for non-list array (primitive array).It would be nice to have another conversion that returns nicely
Vec<ScalarValue>
for the non-list array. I would need this for #7835 too.What changes are included in this PR?
convert_list_array_to_scalar_vec
for ListArrayconvert_non_list_array_to_scalar_vec
for non-ListArrayThe nested array for DistinctArrayAggAccumulator (update_batch) is removed so as to test.
Are these changes tested?
Doc test
Are there any user-facing changes?