-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24636][SQL] Type coercion of arrays for array_join function #21620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
9801771
435b7ed
3082947
2aef9eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| SELECT array_join(array(true, false), ', '); | ||
| SELECT array_join(array(2Y, 1Y), ', '); | ||
| SELECT array_join(array(2S, 1S), ', '); | ||
| SELECT array_join(array(2, 1), ', '); | ||
| SELECT array_join(array(2L, 1L), ', '); | ||
| SELECT array_join(array(9223372036854775809, 9223372036854775808), ', '); | ||
| SELECT array_join(array(2.0D, 1.0D), ', '); | ||
| SELECT array_join(array(float(2.0), float(1.0)), ', '); | ||
| SELECT array_join(array(date '2016-03-14', date '2016-03-13'), ', '); | ||
| SELECT array_join(array(timestamp '2016-11-15 20:54:00.000', timestamp '2016-11-12 20:54:00.000'), ', '); | ||
| SELECT array_join(array('a', 'b'), ', '); | ||
| SELECT array_join(array(array('a', 'b'), array('c', 'd')), ', '); | ||
| SELECT array_join(array(struct('a', 1), struct('b', 2)), ', '); | ||
| SELECT array_join(array(map('a', 1), map('b', 2)), ', '); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| -- Automatically generated by SQLQueryTestSuite | ||
| -- Number of queries: 14 | ||
|
|
||
|
|
||
| -- !query 0 | ||
| SELECT array_join(array(true, false), ', ') | ||
| -- !query 0 schema | ||
| struct<array_join(array(true, false), , ):string> | ||
| -- !query 0 output | ||
| true, false | ||
|
|
||
|
|
||
| -- !query 1 | ||
| SELECT array_join(array(2Y, 1Y), ', ') | ||
| -- !query 1 schema | ||
| struct<array_join(array(2, 1), , ):string> | ||
| -- !query 1 output | ||
| 2, 1 | ||
|
|
||
|
|
||
| -- !query 2 | ||
| SELECT array_join(array(2S, 1S), ', ') | ||
| -- !query 2 schema | ||
| struct<array_join(array(2, 1), , ):string> | ||
| -- !query 2 output | ||
| 2, 1 | ||
|
|
||
|
|
||
| -- !query 3 | ||
| SELECT array_join(array(2, 1), ', ') | ||
| -- !query 3 schema | ||
| struct<array_join(array(2, 1), , ):string> | ||
| -- !query 3 output | ||
| 2, 1 | ||
|
|
||
|
|
||
| -- !query 4 | ||
| SELECT array_join(array(2L, 1L), ', ') | ||
| -- !query 4 schema | ||
| struct<array_join(array(2, 1), , ):string> | ||
| -- !query 4 output | ||
| 2, 1 | ||
|
|
||
|
|
||
| -- !query 5 | ||
| SELECT array_join(array(9223372036854775809, 9223372036854775808), ', ') | ||
| -- !query 5 schema | ||
| struct<array_join(array(9223372036854775809, 9223372036854775808), , ):string> | ||
| -- !query 5 output | ||
| 9223372036854775809, 9223372036854775808 | ||
|
|
||
|
|
||
| -- !query 6 | ||
| SELECT array_join(array(2.0D, 1.0D), ', ') | ||
| -- !query 6 schema | ||
| struct<array_join(array(2.0, 1.0), , ):string> | ||
| -- !query 6 output | ||
| 2.0, 1.0 | ||
|
|
||
|
|
||
| -- !query 7 | ||
| SELECT array_join(array(float(2.0), float(1.0)), ', ') | ||
| -- !query 7 schema | ||
| struct<array_join(array(CAST(2.0 AS FLOAT), CAST(1.0 AS FLOAT)), , ):string> | ||
| -- !query 7 output | ||
| 2.0, 1.0 | ||
|
|
||
|
|
||
| -- !query 8 | ||
| SELECT array_join(array(date '2016-03-14', date '2016-03-13'), ', ') | ||
| -- !query 8 schema | ||
| struct<array_join(array(DATE '2016-03-14', DATE '2016-03-13'), , ):string> | ||
| -- !query 8 output | ||
| 2016-03-14, 2016-03-13 | ||
|
|
||
|
|
||
| -- !query 9 | ||
| SELECT array_join(array(timestamp '2016-11-15 20:54:00.000', timestamp '2016-11-12 20:54:00.000'), ', ') | ||
| -- !query 9 schema | ||
| struct<array_join(array(TIMESTAMP('2016-11-15 20:54:00.0'), TIMESTAMP('2016-11-12 20:54:00.0')), , ):string> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the input array is very long, the automatically generated column name will be also super long?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, yes, it will be. In general, if an expression has |
||
| -- !query 9 output | ||
| 2016-11-15 20:54:00, 2016-11-12 20:54:00 | ||
|
|
||
|
|
||
| -- !query 10 | ||
| SELECT array_join(array('a', 'b'), ', ') | ||
| -- !query 10 schema | ||
| struct<array_join(array(a, b), , ):string> | ||
| -- !query 10 output | ||
| a, b | ||
|
|
||
|
|
||
| -- !query 11 | ||
| SELECT array_join(array(array('a', 'b'), array('c', 'd')), ', ') | ||
| -- !query 11 schema | ||
| struct<array_join(array(array(a, b), array(c, d)), , ):string> | ||
| -- !query 11 output | ||
| [a, b], [c, d] | ||
|
|
||
|
|
||
| -- !query 12 | ||
| SELECT array_join(array(struct('a', 1), struct('b', 2)), ', ') | ||
| -- !query 12 schema | ||
| struct<array_join(array(named_struct(col1, a, col2, 1), named_struct(col1, b, col2, 2)), , ):string> | ||
| -- !query 12 output | ||
| [a, 1], [b, 2] | ||
|
|
||
|
|
||
| -- !query 13 | ||
| SELECT array_join(array(map('a', 1), map('b', 2)), ', ') | ||
| -- !query 13 schema | ||
| struct<array_join(array(map(a, 1), map(b, 2)), , ):string> | ||
| -- !query 13 output | ||
| [a -> 1], [b -> 2] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not every type can be casted to
StringType. What about usingImplicitTypeCasts.implicitCastin order to check if we can cast it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mgaido91,
to be honest, I've considered this option before submitting this PR. But I'm glad that you mentioned this approach. At least, we can discuss pros and cons of different solutions. Usage of
ImplicitTypeCasts.implicitCastwould enable conversion only from primitive types. I think it would be nice to support non-primitive types as well. WDYT?Re: Casting to
StringType: According toCast.canCastmethod should be possible to cast any type toStringType:line 42:
case (_, StringType) => trueOr am I missing something? I hope test cases in .../typeCoercion/native/arrayJoin.sql cover to
StringTypeconversions from all Spark types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I think it is arguable which is the right result of
SELECT array_join(array(array('a', 'b'), array('c', 'd')), ';')for instance. With this PR, the result is[a, b];[c, d]but shouldn't it be[a;b];[c;d]? Moreover, Presto, which is the reference here, doesn't support nested arrays for instance:So, I'd avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no problem. Let's support just arrays of primitive types for now. Thanks!