Skip to content

Conversation

@bingbai0912
Copy link

What changes were proposed in this pull request?

UDF ‘Cast’ will return NULL when input string starts/ends with special character, but hive doesn't.
For examle, we get hour from a string ends with a blank :
hive:

hive> SELECT CAST(' 2018-08-13' AS DATE);//starts with a blank
OK 
2018-08-13
hive> SELECT HOUR('2018-08-13 17:20:07 );//ends with a blank
OK
17

spark-sql:

spark-sql> SELECT CAST(' 2018-08-13' AS DATE);//starts with a blank
NULL
spark-sql> SELECT HOUR('2018-08-13 17:20:07 );//ends with a blank
NULL

All of the following UDFs will be affected:

year
month
day
hour
minute
second
date_add
date_sub

How was this patch tested?

Add test cases

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

case StringType =>
buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs, timeZone).orNull)
buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(
UTF8String.fromString(utfs.toString.trim), timeZone).orNull)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not utfs.trim()?

@wangyum
Copy link
Member

wangyum commented Nov 3, 2018

ping @bingbai0912

c.set(Calendar.MILLISECOND, 0)
checkEvaluation(Cast(Literal("2015-03-18"), DateType), new Date(c.getTimeInMillis))
checkEvaluation(Cast(Literal("2015-03-18 "), DateType), new Date(c.getTimeInMillis))
checkEvaluation(Cast(Literal(" 2015-03-18"), DateType), new Date(c.getTimeInMillis))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT CAST(' 22-OCT-1997' AS TIMESTAMP) FROM dual;

Oracle also trims the leading space.

@gatorsmile
Copy link
Member

@wangyum Could you please take it over?

@wangyum
Copy link
Member

wangyum commented Nov 5, 2018

Sure, @gatorsmile .

@asfgit asfgit closed this in 9e9fa2f Nov 7, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ringToDate

## What changes were proposed in this pull request?

**Hive** and **Oracle** trim the string when cast `stringToTimestamp` and `stringToDate`. this PR support this feature:
![image](https://user-images.githubusercontent.com/5399861/47979721-793b1e80-e0ff-11e8-97c8-24b10950ee9e.png)
![image](https://user-images.githubusercontent.com/5399861/47979725-7dffd280-e0ff-11e8-87d4-5767a00ed46e.png)

## How was this patch tested?

unit tests

Closes apache#22089

Closes apache#22943 from wangyum/SPARK-25098.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants