Skip to content

Conversation

@WangGuangxin
Copy link
Contributor

@WangGuangxin WangGuangxin commented Aug 9, 2020

What changes were proposed in this pull request?

This is a backport of #29375
The trim logic in Cast expression introduced in #26622 trim non-ASCII characters unexpectly.

Before this patch
image

After this patch
image

Why are the changes needed?

The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added more UT

…t handle non-ASCII characters correctly

The trim logic in Cast expression introduced in apache#26622 trim non-ASCII characters unexpectly.

Before this patch
![image](https://user-images.githubusercontent.com/1312321/89513154-caad9b80-d806-11ea-9ebe-17c9e7d1b5b3.png)

After this patch
![image](https://user-images.githubusercontent.com/1312321/89513196-d731f400-d806-11ea-959c-6a7dc29dcd49.png)

The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive

Yes

Added more UT

Closes apache#29375 from WangGuangxin/cast-bugfix.

Authored-by: wangguangxin.cn <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32559][SQL] Fix the trim logic in UTF8String.toInt/toLong did't handle non-ASCII characters correctly [SPARK-32559][SQL][3.0] Fix the trim logic in UTF8String.toInt/toLong did't handle non-ASCII characters correctly Aug 9, 2020
@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun
Copy link
Member

Thank you, @WangGuangxin .

@SparkQA
Copy link

SparkQA commented Aug 9, 2020

Test build #127232 has finished for PR 29393 at commit 3fffc60.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 9, 2020

Test build #127234 has finished for PR 29393 at commit 3fffc60.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 9, 2020

Test build #127240 has finished for PR 29393 at commit 3fffc60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @WangGuangxin .
Merged to branch-3.0 for Apache Spark 3.0.1.

dongjoon-hyun pushed a commit that referenced this pull request Aug 9, 2020
… did't handle non-ASCII characters correctly

### What changes were proposed in this pull request?

This is a backport of #29375
The trim logic in Cast expression introduced in #26622 trim non-ASCII characters unexpectly.

Before this patch
![image](https://user-images.githubusercontent.com/1312321/89513154-caad9b80-d806-11ea-9ebe-17c9e7d1b5b3.png)

After this patch
![image](https://user-images.githubusercontent.com/1312321/89513196-d731f400-d806-11ea-959c-6a7dc29dcd49.png)

### Why are the changes needed?
The behavior described above doesn't make sense, and also doesn't consistent with the behavior when cast a string to double/float, as well as doesn't consistent with the behavior of Hive

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added more UT

Closes #29393 from WangGuangxin/cast-bugfix-branch-3.0.

Authored-by: wangguangxin.cn <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@WangGuangxin WangGuangxin deleted the cast-bugfix-branch-3.0 branch August 10, 2020 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants