Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 10, 2019

What changes were proposed in this pull request?

Fix stringToDate() for the formats yyyy and yyyy-[m]m that assumes there are no additional chars after the last components yyyy and [m]m. In the PR, I propose to check that entire input was consumed for the formats.

After the fix, the input 1999 08 01 will be invalid because it matches to the pattern yyyy but the strings contains additional chars 08 01.

Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.

spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01

This PR makes it return NULL like Hive.

spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL

How was this patch tested?

Added new checks to DateTimeUtilsSuite for the 1999 08 01 and 1999 08 inputs.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107456 has finished for PR 25097 at commit bbe4dcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 10, 2019

It fails on tests that expect wrong results:

SELECT date '1999 Jan 08'
-- !query 28 schema
struct<DATE '1999-01-01':date>
-- !query 28 output
1999-01-01

I will regenerate expected results.

struct<DATE '1999-01-01':date>
struct<>
-- !query 21 output
1999-01-01
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is wrong, it is better to throw the exception because the format with spaces (yyyy [m]m [d]d) is not supported currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I missed this. Thank you for the catching this.
cc @wangyum , too.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #4819 has finished for PR 25097 at commit bbe4dcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Cannot parse the DATE value: 1999 Jan 08(line 1, pos 7)

== SQL ==
SELECT date '1999 Jan 08'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is only for date prefix. With this PR, we will return NULL like Hive.

spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL

@MaxGekk . Could you mention these two cases together in the PR title?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR description and JIRA.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107470 has finished for PR 25097 at commit 2ee26df.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

Hi, @gatorsmile and @cloud-fan .
This is a correctness issue. We must have this in master.

I'd like to have this in old branches, but this fix is able to cause big behavior changes because sometime date is used for directory partition. How do you think about the backporting?

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107473 has finished for PR 25097 at commit 2ee26df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Merged to master. For the backporting, we can discuss soon.

@cloud-fan
Copy link
Contributor

A late LGTM

@dongjoon-hyun
Copy link
Member

Thank you, @cloud-fan . BTW, how do you think about backporting to branch-2.4?

@cloud-fan
Copy link
Contributor

yea let's backport

@dongjoon-hyun
Copy link
Member

Thanks!

dongjoon-hyun pushed a commit that referenced this pull request Jul 11, 2019
… yyyy and yyyy-[m]m formats

Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.

After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.

Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```

This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```

Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.

Closes #25097 from MaxGekk/spark-28015-invalid-date-format.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

This is cherry-picked to branch-2.4 and DateTimeUtilsSuite is tested locally.

@dongjoon-hyun
Copy link
Member

I'll test and backport to branch-2.3, too.

dongjoon-hyun pushed a commit that referenced this pull request Jul 11, 2019
… yyyy and yyyy-[m]m formats

Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.

After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.

Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```

This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```

Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.

Closes #25097 from MaxGekk/spark-28015-invalid-date-format.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 17974e2)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

This is cherry-picked and tested in branch-2.3, too.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 11, 2019

@dongjoon-hyun @srowen @cloud-fan Thank you for your review.

rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
… yyyy and yyyy-[m]m formats

Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.

After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.

Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```

This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```

Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.

Closes apache#25097 from MaxGekk/spark-28015-invalid-date-format.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the spark-28015-invalid-date-format branch September 18, 2019 15:58
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
… yyyy and yyyy-[m]m formats

Fix `stringToDate()` for the formats `yyyy` and `yyyy-[m]m` that assumes there are no additional chars after the last components `yyyy` and `[m]m`. In the PR, I propose to check that entire input was consumed for the formats.

After the fix, the input `1999 08 01` will be invalid because it matches to the pattern `yyyy` but the strings contains additional chars ` 08 01`.

Since Spark 1.6.3 ~ 2.4.3, the behavior is the same.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
1999-01-01
```

This PR makes it return NULL like Hive.
```
spark-sql> SELECT CAST('1999 08 01' AS DATE);
NULL
```

Added new checks to `DateTimeUtilsSuite` for the `1999 08 01` and `1999 08` inputs.

Closes apache#25097 from MaxGekk/spark-28015-invalid-date-format.

Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants