Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 9, 2020

What changes were proposed in this pull request?

If a datetime pattern contains no year field, the day of year field should not be ignored if exists

e.g.

spark-sql> select to_timestamp('31', 'DD');
1970-01-01 00:00:00
spark-sql> select to_timestamp('31 30', 'DD dd');
1970-01-30 00:00:00

spark.sql.legacy.timeParserPolicy legacy
spark-sql> select to_timestamp('31', 'DD');
1970-01-31 00:00:00
spark-sql> select to_timestamp('31 30', 'DD dd');
NULL

This PR only fixes some corner cases that use 'D' pattern to parse datetimes and there is w/o 'y'.

Why are the changes needed?

fix some corner cases

Does this PR introduce any user-facing change?

yes, the day of year field will not be ignored

How was this patch tested?

add unit tests.

@SparkQA
Copy link

SparkQA commented Jun 9, 2020

Test build #123683 has finished for PR 28766 at commit 2d797e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @MaxGekk please review this PR, thanks


private def verifyLocalDate(
accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): Unit = {
if (accessor.isSupported(field) && candidate.isSupported(field)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidate.isSupported(field) this is always true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, yes. I can remove this condition

val f7 = TimestampFormatter("dd-DD", UTC, isParsing = true)
assert(f7.parse("28-59") === date(1970, 2, 28))
assertParsingError(f7.parse("27-59"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add tests in the .sql file?

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123715 has finished for PR 28766 at commit a11a049.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn changed the title [SPARK-31939][SQL] Fix Parsing day of year when year field pattern is missing [SPARK-31939][SQL][test-java11] Fix Parsing day of year when year field pattern is missing Jun 10, 2020
@@ -0,0 +1,16 @@
--- TESTS FOR DATETIME PARSING FUNCTIONS ---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a lot more tests should be added here later :)


private def assertError(pattern: String, datetimeStr: String, expectedMsg: String): Unit = {
val exception = if (useDateFormatter) {
intercept[SparkUpgradeException](dateFormatter(pattern).parse(datetimeStr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean the legacy formatter won't fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the legacy fomatter only fails when datetimeStr contains non-digits which we don't need to test here

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123741 has finished for PR 28766 at commit cd55187.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123753 has finished for PR 28766 at commit 6b8dec4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private def dateFormatter(
pattern: String,
ldf: LegacyDateFormat = FAST_DATE_FORMAT): DateFormatter = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: we can put the parameters in one line

select to_timestamp('2020-01-365', 'yyyy-dd-DDD');
select to_timestamp('2020-10-350', 'yyyy-MM-DDD');
select to_timestamp('2020-11-31-366', 'yyyy-MM-dd-DDD');
select from_csv('2018-366', 'date Date', map('dateFormat', 'yyyy-DDD'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment to explain why we need to test csv: because it's lenient and Spark should throw upgrade exception.

select to_timestamp('2020-30-365', 'yyyy-dd-DDD');
select to_timestamp('2020-12-350', 'yyyy-MM-DDD');
select to_timestamp('2020-12-31-366', 'yyyy-MM-dd-DDD');
select from_csv('2018-365', 'date Date', map('dateFormat', 'yyyy-DDD'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for non-error case, csv has no difference, we don't need to test it.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a few comments for the test

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123747 has finished for PR 28766 at commit b3cdccf.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123743 has finished for PR 28766 at commit 20b585d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123757 has finished for PR 28766 at commit 04ce7ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123763 has finished for PR 28766 at commit 1899e81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2020

Test build #123768 has finished for PR 28766 at commit 78c676e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 22dda6e Jun 11, 2020
cloud-fan pushed a commit that referenced this pull request Jun 11, 2020
…ld pattern is missing

### What changes were proposed in this pull request?

If a datetime pattern contains no year field, the day of year field should not be ignored if exists

e.g.

```
spark-sql> select to_timestamp('31', 'DD');
1970-01-01 00:00:00
spark-sql> select to_timestamp('31 30', 'DD dd');
1970-01-30 00:00:00

spark.sql.legacy.timeParserPolicy legacy
spark-sql> select to_timestamp('31', 'DD');
1970-01-31 00:00:00
spark-sql> select to_timestamp('31 30', 'DD dd');
NULL
```

This PR only fixes some corner cases that use 'D' pattern to parse datetimes and there is w/o 'y'.

### Why are the changes needed?

fix some corner cases

### Does this PR introduce _any_ user-facing change?

yes, the day of year field will not be ignored

### How was this patch tested?

add unit tests.

Closes #28766 from yaooqinn/SPARK-31939.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 22dda6e)
Signed-off-by: Wenchen Fan <[email protected]>
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jun 11, 2020

I reverted this from 3.0 - it causes test failure. @yaooqinn can you create a PR to backport this to branch-3.0?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123839/testReport/

@yaooqinn
Copy link
Member Author

checking on that. @HyukjinKwon, thanks

@dongjoon-hyun
Copy link
Member

Thank you, @HyukjinKwon !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants