-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31939][SQL][test-java11] Fix Parsing day of year when year field pattern is missing #28766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123683 has finished for PR 28766 at commit
|
|
cc @cloud-fan @maropu @MaxGekk please review this PR, thanks |
|
|
||
| private def verifyLocalDate( | ||
| accessor: TemporalAccessor, field: ChronoField, candidate: LocalDate): Unit = { | ||
| if (accessor.isSupported(field) && candidate.isSupported(field)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
candidate.isSupported(field) this is always true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, yes. I can remove this condition
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Show resolved
Hide resolved
| val f7 = TimestampFormatter("dd-DD", UTC, isParsing = true) | ||
| assert(f7.parse("28-59") === date(1970, 2, 28)) | ||
| assertParsingError(f7.parse("27-59")) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add tests in the .sql file?
|
Test build #123715 has finished for PR 28766 at commit
|
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DatetimeFormatterSuite.scala
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,16 @@ | |||
| --- TESTS FOR DATETIME PARSING FUNCTIONS --- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a lot more tests should be added here later :)
|
|
||
| private def assertError(pattern: String, datetimeStr: String, expectedMsg: String): Unit = { | ||
| val exception = if (useDateFormatter) { | ||
| intercept[SparkUpgradeException](dateFormatter(pattern).parse(datetimeStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean the legacy formatter won't fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the legacy fomatter only fails when datetimeStr contains non-digits which we don't need to test here
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DatetimeFormatterSuite.scala
Show resolved
Hide resolved
|
Test build #123741 has finished for PR 28766 at commit
|
|
Test build #123753 has finished for PR 28766 at commit
|
|
|
||
| private def dateFormatter( | ||
| pattern: String, | ||
| ldf: LegacyDateFormat = FAST_DATE_FORMAT): DateFormatter = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: we can put the parameters in one line
| select to_timestamp('2020-01-365', 'yyyy-dd-DDD'); | ||
| select to_timestamp('2020-10-350', 'yyyy-MM-DDD'); | ||
| select to_timestamp('2020-11-31-366', 'yyyy-MM-dd-DDD'); | ||
| select from_csv('2018-366', 'date Date', map('dateFormat', 'yyyy-DDD')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a comment to explain why we need to test csv: because it's lenient and Spark should throw upgrade exception.
| select to_timestamp('2020-30-365', 'yyyy-dd-DDD'); | ||
| select to_timestamp('2020-12-350', 'yyyy-MM-DDD'); | ||
| select to_timestamp('2020-12-31-366', 'yyyy-MM-dd-DDD'); | ||
| select from_csv('2018-365', 'date Date', map('dateFormat', 'yyyy-DDD')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for non-error case, csv has no difference, we don't need to test it.
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except a few comments for the test
|
Test build #123747 has finished for PR 28766 at commit
|
|
Test build #123743 has finished for PR 28766 at commit
|
|
Test build #123757 has finished for PR 28766 at commit
|
|
Test build #123763 has finished for PR 28766 at commit
|
|
Test build #123768 has finished for PR 28766 at commit
|
|
thanks, merging to master/3.0! |
…ld pattern is missing
### What changes were proposed in this pull request?
If a datetime pattern contains no year field, the day of year field should not be ignored if exists
e.g.
```
spark-sql> select to_timestamp('31', 'DD');
1970-01-01 00:00:00
spark-sql> select to_timestamp('31 30', 'DD dd');
1970-01-30 00:00:00
spark.sql.legacy.timeParserPolicy legacy
spark-sql> select to_timestamp('31', 'DD');
1970-01-31 00:00:00
spark-sql> select to_timestamp('31 30', 'DD dd');
NULL
```
This PR only fixes some corner cases that use 'D' pattern to parse datetimes and there is w/o 'y'.
### Why are the changes needed?
fix some corner cases
### Does this PR introduce _any_ user-facing change?
yes, the day of year field will not be ignored
### How was this patch tested?
add unit tests.
Closes #28766 from yaooqinn/SPARK-31939.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 22dda6e)
Signed-off-by: Wenchen Fan <[email protected]>
|
I reverted this from 3.0 - it causes test failure. @yaooqinn can you create a PR to backport this to branch-3.0? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123839/testReport/ |
|
checking on that. @HyukjinKwon, thanks |
|
Thank you, @HyukjinKwon ! |
What changes were proposed in this pull request?
If a datetime pattern contains no year field, the day of year field should not be ignored if exists
e.g.
This PR only fixes some corner cases that use 'D' pattern to parse datetimes and there is w/o 'y'.
Why are the changes needed?
fix some corner cases
Does this PR introduce any user-facing change?
yes, the day of year field will not be ignored
How was this patch tested?
add unit tests.