Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support List type coercion for CASE-WHEN-THEN expression #12490

Merged
merged 7 commits into from
Sep 21, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Sep 16, 2024

Which issue does this PR close?

Closes #12370.

Rationale for this change

The rule for the List type is to choose the wider type (LargeList > List > FixedSizeList) as the target type.

What changes are included in this PR?

  • The type coercion rule for List type comparison.
  • Support to coerce the list type for then expressions
  • Support to compare nested type for case-when expression

Are these changes tested?

yes

Are there any user-facing changes?

no

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Sep 16, 2024
@goldmedal
Copy link
Contributor Author

I tried to add some sql test for FixedSizeList case but I can't find the way to create it through SQL API 🤔

> select arrow_cast([1,2,3], 'FixedSizeList(Int64, 10)');
Error during planning: Unsupported type 'FixedSizeList(Int64, 10)'. Must be a supported arrow type name such as 'Int32' or 'Timestamp(Nanosecond, None)'. Error finding i64 for FixedSizeList, got 'Int64'

@goldmedal goldmedal marked this pull request as ready for review September 16, 2024 15:32
@goldmedal goldmedal marked this pull request as draft September 17, 2024 13:58
@github-actions github-actions bot added the physical-expr Physical Expressions label Sep 17, 2024
@goldmedal goldmedal changed the title Support List, FixedSizeList and LargeList type coercion for comparison Support List type coercion for CASE-WHEN-THEN expression Sep 17, 2024
@goldmedal goldmedal marked this pull request as ready for review September 17, 2024 14:35
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @goldmedal -- this is also really nicely implemented, documented, and tested 🏅

I don't have anything to add here

cc @jayzhan211

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @goldmedal

Comment on lines 1835 to 1855
macro_rules! test_case_expression {
($expr:expr, $when_then:expr, $case_when_type:expr, $then_else_type:expr, $schema:expr) => {
let case = Case {
expr: Some(Box::new(col($expr))),
when_then_expr: $when_then,
else_expr: None,
};

let case_when_common_type = $case_when_type;
let then_else_common_type = $then_else_type;
let expected = cast_helper(
case.clone(),
&case_when_common_type,
&then_else_common_type,
&$schema,
);

let actual = coerce_case_expression(case, &$schema)?;
assert_eq!(expected, actual);
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems two test_case_expression are the same, could we only keep one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. I tried to simplify them in a8ea6cf

@jayzhan211
Copy link
Contributor

I tried to add some sql test for FixedSizeList case but I can't find the way to create it through SQL API 🤔

> select arrow_cast([1,2,3], 'FixedSizeList(Int64, 10)');
Error during planning: Unsupported type 'FixedSizeList(Int64, 10)'. Must be a supported arrow type name such as 'Int32' or 'Timestamp(Nanosecond, None)'. Error finding i64 for FixedSizeList, got 'Int64'

The position is incorrect,

query ?
select arrow_cast([1,2,3], 'FixedSizeList(3, Int32)');
----
[1, 2, 3]

Comment on lines +1029 to +1037
// Coerce to the left side FixedSizeList type if the list lengths are the same,
// otherwise coerce to list with the left type for dynamic length
(FixedSizeList(lf, ls), FixedSizeList(_, rs)) => {
if ls == rs {
Some(lhs_type.clone())
} else {
Some(List(Arc::clone(lf)))
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't cast a FixedSizeList to a different length FixedSizeList,

> select arrow_cast(arrow_cast([1,2], 'FixedSizeList(2, Int64)'), 'FixedSizeList(3, Int64)');
This feature is not implemented: Unsupported CAST from FixedSizeList(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 2) to FixedSizeList(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 3)

I choose to use the List for the dynamic length.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me - the alternative is to just reject the query -- I think this is reasonable

@goldmedal
Copy link
Contributor Author

The position is incorrect,

query ?
select arrow_cast([1,2,3], 'FixedSizeList(3, Int32)');
----
[1, 2, 3]

Thanks @jayzhan211 for solving my question! It was very helpful to me. I added more tests for the FixedSizeLIst type. I also fix the type coercion rule for the FixedSizeList to FixedSizeList case.

Could @alamb and @Weijun-H also double-check this change? Many thanks 🙇

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- I'll wait to see if @Weijun-H would also like to review

Comment on lines +1029 to +1037
// Coerce to the left side FixedSizeList type if the list lengths are the same,
// otherwise coerce to list with the left type for dynamic length
(FixedSizeList(lf, ls), FixedSizeList(_, rs)) => {
if ls == rs {
Some(lhs_type.clone())
} else {
Some(List(Arc::clone(lf)))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me - the alternative is to just reject the query -- I think this is reasonable

@Weijun-H
Copy link
Member

ship it!

@alamb alamb merged commit 244ce5a into apache:main Sep 21, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 21, 2024

🚀

bgjackma pushed a commit to bgjackma/datafusion that referenced this pull request Sep 25, 2024
* support list type coercion

* add planing and sql tests

* clippy

* support to compare nested type for case-when expression

* simplify the macro rules

* fix the FixedSizeList type coercion and add tests

* add test for THEN-ELSE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LargeList and List type coercion not working in CASE WHEN
4 participants