Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 19, 2019

What changes were proposed in this pull request?

In the PR, I propose to use ZoneId instead of TimeZone in:

  • the apply and getFractionFormatter methods of the TimestampFormatter object,
  • and in implementations of the TimestampFormatter trait like FractionTimestampFormatter.

The reason of the changes is to avoid unnecessary conversion from TimeZone to ZoneId because ZoneId is used in TimestampFormatter implementations internally, and the conversion is performed via String which is not for free. Also taking into account that TimeZone instances are converted from String in some cases, the worse case looks like String -> TimeZone -> String -> ZoneId. The PR eliminates the unneeded conversions.

How was this patch tested?

It was tested by DateExpressionsSuite, DateTimeUtilsSuite and TimestampFormatterSuite.

@SparkQA
Copy link

SparkQA commented Mar 19, 2019

Test build #103667 has finished for PR 24141 at commit e11fea6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

pattern: String,
timeZone: TimeZone,
locale: Locale) extends TimestampFormatter with DateTimeFormatterHelper {
pattern: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanx. Just wondering why scalastyle didn't catch that.

@srowen
Copy link
Member

srowen commented Mar 19, 2019

Looks reasonable to me. Is there any behavior change? I don't think so, just checking.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 19, 2019

Is there any behavior change?

Potentially we could observe a behavoir change in parsing time zone id strings but because we did it twice in TimeZone and ZoneId, and now only in ZoneId. This is not the case. Maybe only in corner cases when TimeZone ignores errors but ZoneId doesn't.

@SparkQA
Copy link

SparkQA commented Mar 19, 2019

Test build #103675 has started for PR 24141 at commit df58cfe.

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 19, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Mar 20, 2019

Test build #103693 has finished for PR 24141 at commit df58cfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

cloud-fan pushed a commit that referenced this pull request Apr 12, 2019
… and FromUnixTime

## What changes were proposed in this pull request?

SPARK-27199 introduced the use of `ZoneId` instead of `TimeZone` in a few date/time expressions.
There were 3 occurrences of `ctx.addReferenceObj("zoneId", zoneId)` in that PR, which had a bug because while the `java.time.ZoneId` base type is public, the actual concrete implementation classes are not public, so using the 2-arg version of `CodegenContext.addReferenceObj` would incorrectly generate code that reference non-public types (`java.time.ZoneRegion`, to be specific). The 3-arg version should be used, with the class name of the referenced object explicitly specified to the public base type.

One of such occurrences was caught in testing in the main PR of SPARK-27199 (#24141), for `DateFormatClass`. But the other 2 occurrences slipped through because there were no test cases that covered them.

Example of this bug in the current Apache Spark master, in a Spark Shell:
```
scala> Seq(("2016-04-08", "yyyy-MM-dd")).toDF("s", "f").repartition(1).selectExpr("to_unix_timestamp(s, f)").show
...
java.lang.IllegalAccessError: tried to access class java.time.ZoneRegion from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1
```

This PR fixes the codegen issues and adds the corresponding unit tests.

## How was this patch tested?

Enhanced tests in `DateExpressionsSuite` for `to_unix_timestamp` and `from_unixtime`.

Closes #24352 from rednaxelafx/fix-spark-27199.

Authored-by: Kris Mok <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@MaxGekk MaxGekk deleted the zone-id branch September 18, 2019 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants