-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type #31954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #136470 has started for PR 31954 at commit |
|
ANSI explicit CAST is in 3.1 so this is a bug fix of ANSI mode for 3.1 (because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect the coming year-month and day-time interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan . Usually, we do the explicitly allowed-list approach in case of types. Is this change okay?
If this PR aims for complex type only, why don't we add them explicitly instead of doing this widely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @MaxGekk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df.show needs to cast the column to string, I think we need to support casting from all the data types here, otherwise df.show may still be broken under some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, we don't support such casting. I opened the JIRAs for that: SPARK-34667 and SPARK-34668
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @cloud-fan .
According to this test case, it's not a bug because it is designed like this.
ANSI explicit CAST is in 3.1 so this is a bug fix of ANSI mode for 3.1 (because df.show is not working with ANSI mode)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it is designed like this, but if df.show can't work, I think it's a bug in the design...
|
For me, this is an improvement for Apache Spark 3.2.0, @gengliangwang . |
|
@dongjoon-hyun I think putting it in branch-3.1 makes sense as well. It fixes the broken API |
|
New improvements always deliver something which doesn't work before. According to the existing test case, this limitation is designed from the beginning at the Spark 3.1 implementation. For me, it's just a known limitation instead of a bug. |
|
New features do not work properly with all combinations. Like SPARK-34827 (AQE + IO Encryption), this kind of efforts should be done in the new releases. |
|
For SPARK-34827, it does expose a bug and we fixed it in 3.1, see #31898 I don't think failing in |
|
@cloud-fan . It's not fixed. The new implementation is disabled. :) |
|
Like we are doing in SPARK-33828 for AQE QA, I believe ANSI also needs more QA with umbrella JIRA issue, @gengliangwang and @cloud-fan . |
|
I don't understand what's the difference here. Both AQE and IO Encryption are disabled by default in 3.1 and we still merged #31898 to 3.1 I'm OK if you have different ideas to make I agree we should have more QAs, but it doesn't mean we should stop backporting bug fixes. We have fixed many AQE issues and backported them to 3.0/3.1, although AQE is not turned on by default there. If you don't think this PR is the corrected fix, that's a different story and we can discuss more here. |
|
This is a documented behavior, https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html.
So, this is not an unexpected failure. For me, this is a well-documented limitation of Apache Spark 3.1.x with the explicit test coverage. |
|
I don't see any document saying that I agree that it's tricky to change a documented behavior, maybe it's arguable to change the ANSI explicit CAST behavior in 3.1. But |
|
+1 for that. I guess it will be a fix for
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
31ab2b6 to
1707885
Compare
|
Talked to @cloud-fan offline. It's overkilling to find a different fix(e.g. create a new cast expression) in the implementation of |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Thanks for the review. Merging to master. |
|
Thank you for the decision, @gengliangwang and @cloud-fan . |
|
Test build #136524 has finished for PR 31954 at commit
|
…FailureMessage ### What changes were proposed in this pull request? After #31954, Array type is allowed to be cast as String type. So the customized conversion failure message branch from AnsiCast.typeCheckFailureMessage won't be reached anymore. This PR is to remove the dead code. ### Why are the changes needed? Code clean up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Just removing dead code. Closes #32004 from gengliangwang/SPARK-34856-followup. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…FailureMessage ### What changes were proposed in this pull request? After apache/spark#31954, Array type is allowed to be cast as String type. So the customized conversion failure message branch from AnsiCast.typeCheckFailureMessage won't be reached anymore. This PR is to remove the dead code. ### Why are the changes needed? Code clean up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Just removing dead code. Closes #32004 from gengliangwang/SPARK-34856-followup. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Sean Owen <[email protected]>
### What changes were proposed in this pull request? Allow casting complex types as string type in ANSI mode. ### Why are the changes needed? Currently, complex types are not allowed to cast as string type. This breaks the DataFrame.show() API. E.g ``` scala> sql(“select array(1, 2, 2)“).show(false) org.apache.spark.sql.AnalysisException: cannot resolve ‘CAST(`array(1, 2, 2)` AS STRING)’ due to data type mismatch: cannot cast array<int> to string with ANSI mode on. ``` We should allow the conversion as the extension of the ANSI SQL standard, so that the DataFrame.show() still work in ANSI mode. ### Does this PR introduce _any_ user-facing change? Yes, casting complex types as string type is now allowed in ANSI mode. ### How was this patch tested? Unit tests. Closes apache#31954 from gengliangwang/fixExplicitCast. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
Allow casting complex types as string type in ANSI mode.
Why are the changes needed?
Currently, complex types are not allowed to cast as string type. This breaks the DataFrame.show() API. E.g
We should allow the conversion as the extension of the ANSI SQL standard, so that the DataFrame.show() still work in ANSI mode.
Does this PR introduce any user-facing change?
Yes, casting complex types as string type is now allowed in ANSI mode.
How was this patch tested?
Unit tests.