Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Arrays for the Map scalar functions #11712

Merged
merged 24 commits into from
Aug 13, 2024
Merged

Conversation

dharanad
Copy link
Contributor

@dharanad dharanad commented Jul 29, 2024

Which issue does this PR close?

Closes #11436

Rationale for this change

What changes are included in this PR?

Add new function to handle support arrays for the Map scalar function.

Are these changes tested?

Existing testcases

cargo test

Are there any user-facing changes?

No breaking changes

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 29, 2024
@dharanad dharanad changed the title [WIP][Not for Review] Support Arrays for the Map scalar functions Support Arrays for the Map scalar functions Jul 31, 2024
@dharanad dharanad changed the title Support Arrays for the Map scalar functions [WIP] Support Arrays for the Map scalar functions Jul 31, 2024
@dharanad dharanad changed the title [WIP] Support Arrays for the Map scalar functions Support Arrays for the Map scalar functions Aug 2, 2024
@dharanad
Copy link
Contributor Author

dharanad commented Aug 2, 2024

Sorry, this change took more time than expected. The good thing is I deepened my understanding of Arrow.
@goldmedal @jayzhan211 requesting you help to review this change.

@dharanad dharanad marked this pull request as ready for review August 2, 2024 07:00
@@ -202,3 +226,128 @@ fn get_element_type(data_type: &DataType) -> datafusion_common::Result<&DataType
),
}
}

/// Helper function to create MapArray from array of values to support arrays for Map scalar function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dharanad for working on this. I have some suggestions about error handling and adding more tests. Others look good to me.

datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/map.slt Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
@dharanad dharanad requested a review from goldmedal August 8, 2024 19:29
@alamb
Copy link
Contributor

alamb commented Aug 9, 2024

I hope to review this later today

datafusion/functions-nested/src/map.rs Outdated Show resolved Hide resolved
Comment on lines +283 to +297
let mut key_array_vec = vec![];
let mut value_array_vec = vec![];
for (k, v) in keys.iter().zip(values.iter()) {
running_offset = running_offset.add(O::usize_as(k.len()));
offset_buffer.push(running_offset);
key_array_vec.push(k.as_ref());
value_array_vec.push(v.as_ref());
}

// concatenate all the arrays
let flattened_keys = arrow::compute::concat(key_array_vec.as_ref())?;
if flattened_keys.null_count() > 0 {
return exec_err!("keys cannot be null");
}
let flattened_values = arrow::compute::concat(value_array_vec.as_ref())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use arrow::array::MutableArrayData to avoid concatenation here 🤔 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the hood, compute::concat uses MutableArrayData. Once this PR gets merged, i can look if i can improve this by using MutableArrayData and not using concat at all.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 44b9066 into apache:main Aug 13, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 13, 2024

Thanks @dharanad and @jayzhan211

@dharanad
Copy link
Contributor Author

Thank You @alamb @goldmedal @jayzhan211 @Weijun-H .
I deepened my understanding on Arrow and had fun working on this issue.

@alamb
Copy link
Contributor

alamb commented Aug 14, 2024

Thank You @alamb @goldmedal @jayzhan211 @Weijun-H . I deepened my understanding on Arrow and had fun working on this issue.

That describes almost all of my experiences working on this project. I am glad others get the same thrill

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Arrays for the Map scalar functions
5 participants