-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats #25097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #107456 has finished for PR 25097 at commit
|
|
It fails on tests that expect wrong results: I will regenerate expected results. |
| struct<DATE '1999-01-01':date> | ||
| struct<> | ||
| -- !query 21 output | ||
| 1999-01-01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is wrong, it is better to throw the exception because the format with spaces (yyyy [m]m [d]d) is not supported currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I missed this. Thank you for the catching this.
cc @wangyum , too.
|
Test build #4819 has finished for PR 25097 at commit
|
| Cannot parse the DATE value: 1999 Jan 08(line 1, pos 7) | ||
|
|
||
| == SQL == | ||
| SELECT date '1999 Jan 08' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is only for date prefix. With this PR, we will return NULL like Hive.
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
@MaxGekk . Could you mention these two cases together in the PR title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR description and JIRA.
|
Test build #107470 has finished for PR 25097 at commit
|
|
Retest this please. |
|
Hi, @gatorsmile and @cloud-fan . I'd like to have this in old branches, but this fix is able to cause big behavior changes because sometime |
|
Test build #107473 has finished for PR 25097 at commit
|
|
Merged to master. For the backporting, we can discuss soon. |
|
A late LGTM |
|
Thank you, @cloud-fan . BTW, how do you think about backporting to |
|
yea let's backport |
|
Thanks! |
… yyyy and yyyy-[m]m formats
Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.
After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.
Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```
This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```
Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.
Closes #25097 from MaxGekk/spark-28015-invalid-date-format.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
|
This is cherry-picked to |
|
I'll test and backport to |
… yyyy and yyyy-[m]m formats
Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.
After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.
Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```
This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```
Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.
Closes #25097 from MaxGekk/spark-28015-invalid-date-format.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 17974e2)
Signed-off-by: Dongjoon Hyun <[email protected]>
|
This is cherry-picked and tested in |
|
@dongjoon-hyun @srowen @cloud-fan Thank you for your review. |
… yyyy and yyyy-[m]m formats
Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.
After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.
Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```
This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```
Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.
Closes apache#25097 from MaxGekk/spark-28015-invalid-date-format.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
… yyyy and yyyy-[m]m formats
Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.
After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.
Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```
This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```
Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.
Closes apache#25097 from MaxGekk/spark-28015-invalid-date-format.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Fix
stringToDate()for the formatsyyyyandyyyy-[m]mthat assumes there are no additional chars after the last componentsyyyyand[m]m. In the PR, I propose to check that entire input was consumed for the formats.After the fix, the input
1999 08 01will be invalid because it matches to the patternyyyybut the strings contains additional chars08 01.Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
This PR makes it return NULL like Hive.
How was this patch tested?
Added new checks to
DateTimeUtilsSuitefor the1999 08 01and1999 08inputs.