-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32559][SQL][3.0] Fix the trim logic in UTF8String.toInt/toLong did't handle non-ASCII characters correctly #29393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t handle non-ASCII characters correctly The trim logic in Cast expression introduced in apache#26622 trim non-ASCII characters unexpectly. Before this patch  After this patch  The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive Yes Added more UT Closes apache#29375 from WangGuangxin/cast-bugfix. Authored-by: wangguangxin.cn <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
ok to test |
|
Thank you, @WangGuangxin . |
|
Test build #127232 has finished for PR 29393 at commit
|
|
Retest this please. |
|
Test build #127234 has finished for PR 29393 at commit
|
|
Retest this please. |
|
Test build #127240 has finished for PR 29393 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @WangGuangxin .
Merged to branch-3.0 for Apache Spark 3.0.1.
… did't handle non-ASCII characters correctly ### What changes were proposed in this pull request? This is a backport of #29375 The trim logic in Cast expression introduced in #26622 trim non-ASCII characters unexpectly. Before this patch  After this patch  ### Why are the changes needed? The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Added more UT Closes #29393 from WangGuangxin/cast-bugfix-branch-3.0. Authored-by: wangguangxin.cn <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This is a backport of #29375
The trim logic in Cast expression introduced in #26622 trim non-ASCII characters unexpectly.
Before this patch

After this patch

Why are the changes needed?
The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Added more UT