Skip to content

Conversation

@javierivanov
Copy link
Contributor

@javierivanov javierivanov commented Feb 8, 2020

What changes were proposed in this pull request?

When executing the following query using a week based pattern the result is null.

scala> spark.sql("select unix_timestamp('2020-10', 'YYYY-ww')").show
+--------------------------------+
|unix_timestamp(2020-10, YYYY-ww)|
+--------------------------------+
|                            null|
+--------------------------------+

You can replicate this behavior with:

import java.time.temporal._
import java.time.format._
import java.util.Locale
import java.time.chrono.IsoChronology

val pattern = "YYYY-ww"
val locale = Locale.US
val s = "2020-10"
val formatter = new DateTimeFormatterBuilder().appendPattern(pattern)
    .parseDefaulting(ChronoField.ERA, 1)
    .parseDefaulting(ChronoField.MONTH_OF_YEAR, 1)
    .parseDefaulting(ChronoField.DAY_OF_MONTH, 1)
    .parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
    .parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
    .toFormatter(locale)
    .withChronology(IsoChronology.INSTANCE)
    .withResolverStyle(ResolverStyle.STRICT)

val result = formatter.parse(s)
result.query(TemporalQueries.localDate)

The date is parsed, but not resolved.

This is caused by the default temporal fields that are conflicting with the week based values:

      .parseDefaulting(ChronoField.ERA, 1)
      .parseDefaulting(ChronoField.MONTH_OF_YEAR, 1)
      .parseDefaulting(ChronoField.DAY_OF_MONTH, 1)

To avoid this conflict, I propose to check if the pattern is a week/year based and change the defaults as follows:

      .parseDefaulting(ChronoField.ERA, 1)
      .parseDefaulting(ChronoField.DAY_OF_WEEK, 1)

Note:

Additonally in JDK8 there is know issue for patterns (YYYYww/YYww) https://bugs.openjdk.java.net/browse/JDK-8145633 which fails to parse contiguos patterns without separation, this has been fixed in version 9. To avoid parsing issues add a separator like this (YYYY-ww/YYYY ww)

Why are the changes needed?

Week/Year based dates should be supported since it is part of ISO8601. Also, as seen in SPARK-30688, this issue is failing silently.

Does this PR introduce any user-facing change?

  • No

How was this patch tested?

  • Passing current tests
  • Added test case for different timezones in DateExpressionsSuite.scala

@HyukjinKwon
Copy link
Member

cc @MaxGekk since you're touching the codes around here.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@javierivanov
Copy link
Contributor Author

ping @MaxGekk @HyukjinKwon :)

} else {
Seq(
(ChronoField.MONTH_OF_YEAR, 1),
(ChronoField.DAY_OF_MONTH, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found these default values are very fragile to be set here, so I remove them all in #28576 . Can you check if it fixes your problem?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Aug 28, 2020
@github-actions github-actions bot closed this Aug 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants