Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Coercion for MakeArray to coerce_arguments_for_signature and introduce another one for ArrayAppend #8317

Merged
merged 7 commits into from
Dec 18, 2023

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Nov 25, 2023

Which issue does this PR close?

Ref #7142
Close #7995

Rationale for this change

Fix Null Handing for array function. ArrayAppend first, others later.

What changes are included in this PR?

Move coercion for MakeArray to different place.
Add signature for ArrayAppend

Are these changes tested?

MakeArray - existing test
ArrayAppend - new test

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Nov 25, 2023
@jayzhan211 jayzhan211 changed the title Move Coercion for MakeArray while verifying signature and introduce signature verification for ArrayAppend Move Coercion for MakeArray to coerce_arguments_for_signature and introduce another one for ArrayAppend Nov 25, 2023
TypeSignature::Exact(valid_types) => vec![valid_types.clone()],
TypeSignature::ArrayAppendLikeSignature => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a specialized checking for array append at the end. And I think we will need specialized check for other pattern too

@@ -181,7 +239,7 @@ fn coerced_from<'a>(
Int64
if matches!(
type_from,
Null | Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32
Null | Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 | Boolean
Copy link
Contributor Author

@jayzhan211 jayzhan211 Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only i64 is fixed for passing existing test in array.slt. More types should be fixed in #8302

@jayzhan211
Copy link
Contributor Author

wait on #8318 and #8331

@jayzhan211 jayzhan211 force-pushed the signature-for-array branch 3 times, most recently from 0dbfaab to 8483b76 Compare December 4, 2023 01:18
@jayzhan211 jayzhan211 marked this pull request as ready for review December 4, 2023 01:18
if let Some(corced_type) = corced_type {
Ok(corced_type)
} else {
internal_err!("Coercion from {acc:?} to {x:?} failed.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call unwrap_or previously so select [1, true, null] unexpectedly correct since true is castable to 1 in arrow-rs but not in datafusion. select [true, 1, null] failed. It is better that we just return error if not coercible.

@jayzhan211 jayzhan211 marked this pull request as draft December 4, 2023 05:54
@jayzhan211 jayzhan211 marked this pull request as ready for review December 4, 2023 12:17
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Dec 7, 2023

@alamb Ready for review! I hope this is a better solution for dealing with nulls. I had tried Trait SignatureComputation, but it is somewhat overly complicated.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is very much in the right direction I think. I had some suggestions / comments about some of the specific cases but it is looking very close

@@ -95,6 +95,8 @@ pub enum TypeSignature {
VariadicEqual,
/// One or more arguments with arbitrary types
VariadicAny,
/// A function such as `make_array` should be coerced to the same type
VariadicCoerced,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this is different than VariadicEqual? It seems from the comments here they are quite similar 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that VariadicEqual is not used in any function. Maybe we can just keep one.

I thought Equal is the one that don't care about coercion. All the type should be equal like (i32, i32, i32).
Coerced is the one that ensuring the final coerced type is the same (all of the type coercible to the same one), like (i32, i64) -> i64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think current VariadicCoerced includes VariadicEqual use case as well. We can just have one signature

@@ -113,6 +115,8 @@ pub enum TypeSignature {
/// Function `make_array` takes 0 or more arguments with arbitrary types, its `TypeSignature`
/// is `OneOf(vec![Any(0), VariadicAny])`.
OneOf(Vec<TypeSignature>),
/// Specialized Signature for ArrayAppend and similar functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using a more generic name. Perhaps something like

    /// The first argument is an array type ([`DataType::List`], or [`DataType::LargeList`]
    /// and the subsequent arguments are coerced to the List's element type
    ///
    /// For example a call to `func(a: List(int64), b: int32, c: utf8)` would attempt to coerce
    /// all the arguments to `int64`: 
    /// ```
    /// func(a: List(int64), cast(b as int64): int64, cast(c as int64): int64)
    /// ```
    ArrayAndElements

There may be more general ways of expressing the array function types too 🤔

@@ -590,26 +590,6 @@ fn coerce_arguments_for_fun(
.collect::<Result<Vec<_>>>()?;
}

if *fun == BuiltinScalarFunction::MakeArray {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -265,10 +265,8 @@ AS VALUES
(make_array([28, 29, 30], [31, 32, 33], [34, 35, 36], [28, 29, 30], [31, 32, 33], [34, 35, 36], [28, 29, 30], [31, 32, 33], [34, 35, 36], [28, 29, 30]), [28, 29, 30], [37, 38, 39], 10)
;

query ?
query error
select [1, true, null]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an error because true can't be coerced to an integer, right? FWIW I think that is fine and is consistent with the postgres rues:

postgres=# select array[1, true, null];
ERROR:  ARRAY types integer and boolean cannot be matched
LINE 1: select array[1, true, null];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

# [[]] [[4]]
query ???????
select
array_append(null, 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support array_append(null, ...)?

Postgres does not allow this:

postgres=# select array_append(null, array[2,3]);
ERROR:  could not find array type for data type integer[]

It also does't try to find a fancy type with an empty list:

postgres=# select array_append(array[], array[2,3]);
ERROR:  cannot determine type of empty array
LINE 1: select array_append(array[], array[2,3]);
                            ^
HINT:  Explicitly cast to the desired type, for example ARRAY[]::integer[].

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of other sql behavior

Duckdb: [[2, 3]], [[2, 3]]
ClickHouse: null, [[2, 3]]
Postgres: error, error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for array_append([], [2,3]), it is fine to not follow postgres and return [[2, 3]] like clickhouse and duckdb.
For array_append(null, [2, 3]), I think we can follow postrgres.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clickhouse and Duckdb has the same output for array_append(make_array(null, null), 1) too.

@jayzhan211 jayzhan211 requested a review from alamb December 9, 2023 02:00
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @jayzhan211 .

@alamb
Copy link
Contributor

alamb commented Dec 18, 2023

I merged this branch up from main to ensure there are no logical conflicts, and if all tests pass I intend to merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants