Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix length error with array_has #12459

Merged
merged 7 commits into from
Sep 14, 2024
Merged

fix length error with array_has #12459

merged 7 commits into from
Sep 14, 2024

Conversation

samuelcolvin
Copy link
Contributor

Rationale for this change

Fixes an error we've been experiencing in prod where some queries that use array_has (including via ANY()) result int:

DataFusion CLI v41.0.0
Error: Internal error: UDF returned a different number of rows than expected. Expected: 8192, Got: 1.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

What changes are included in this PR?

Fix a bug in array_has where it returned 1 row, instead of the size of the group.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@samuelcolvin samuelcolvin mentioned this pull request Sep 13, 2024
@github-actions github-actions bot added the core Core DataFusion crate label Sep 13, 2024
Copy link
Contributor Author

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I would love to get this merged, it's causing many queries that use ANY() to fail right now.

cc @jayzhan211 who I think wrote this.

@@ -200,7 +201,10 @@ fn array_has_dispatch_for_scalar<O: OffsetSizeTrait>(
// If first argument is empty list (second argument is non-null), return false
// i.e. array_has([], non-null element) -> false
if values.len() == 0 {
return Ok(Arc::new(BooleanArray::from(vec![Some(false)])));
return Ok(Arc::new(BooleanArray::new(
BooleanBuffer::new_unset(haystack.len()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point is that in this case the array returned should be the same length as the haystack, not 1.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this fix @samuelcolvin -- I have one small nit pick about where the test should live, but we can also fix that as a follow on PR.

datafusion/core/tests/user_defined/array_has.rs Outdated Show resolved Hide resolved
datafusion/core/tests/user_defined/array_has.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the core Core DataFusion crate label Sep 14, 2024
@samuelcolvin
Copy link
Contributor Author

slt test updated and dedicated test removed.

It would be great to get this merged. We'll run datafusion main until there's a another release since this is is affecting a lot of users as we migrate querying from timescale to datafusion.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# test empty arrays return length
# issue: https://github.com/apache/datafusion/pull/12459
statement ok
create table values_all_empty (a int[]) as values ([]), ([]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful 🎉

@alamb alamb merged commit 6590ea3 into apache:main Sep 14, 2024
24 checks passed
@samuelcolvin samuelcolvin deleted the fix-array_has branch September 14, 2024 10:26
@samuelcolvin
Copy link
Contributor Author

Thanks @alamb for merging so quickly. 🙏

@alamb
Copy link
Contributor

alamb commented Sep 15, 2024

Thanks @alamb for merging so quickly. 🙏

I try hard to prioritize bug fixes. Thank you for fixing the bug!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants