[SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q' #28592

yaooqinn · 2020-05-20T09:35:20Z

What changes were proposed in this pull request?

Five continuous pattern characters with 'G/M/L/E/u/Q/q' means Narrow-Text Style while we turn to use java.time.DateTimeFormatterBuilder since 3.0.0, which output the leading single letter of the value, e.g. December would be D. In Spark 2.4 they mean Full-Text Style.

In this PR, we explicitly disable Narrow-Text Style for these pattern characters.

Why are the changes needed?

Without this change, there will be a silent data change.

Does this PR introduce any user-facing change?

Yes, queries with datetime operations using datetime patterns, e.g. G/M/L/E/u will fail if the pattern length is 5 and other patterns, e,g. 'k', 'm' also can accept a certain number of letters.

datetime patterns that are not supported by the new parser but the legacy will get SparkUpgradeException, e.g. "GGGGG", "MMMMM", "LLLLL", "EEEEE", "uuuuu", "aa", "aaa". 2 options are given to end-users, one is to use legacy mode, and the other is to follow the new online doc for correct datetime patterns

2, datetime patterns that are not supported by both the new parser and the legacy, e.g. "QQQQQ", "qqqqq", will get IllegalArgumentException which is captured by Spark internally and results NULL to end-users.

How was this patch tested?

add unit tests

…/L/E/u/Q/q'

yaooqinn · 2020-05-20T09:36:35Z

cc @cloud-fan thanks

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

SparkQA · 2020-05-20T14:46:40Z

Test build #122884 has finished for PR 28592 at commit 78fb74a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-20T19:04:22Z

Test build #122898 has finished for PR 28592 at commit e178a6b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-21T10:55:13Z

Test build #122926 has finished for PR 28592 at commit 1d31ed2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala

cloud-fan · 2020-05-21T11:52:06Z

sql/core/src/test/resources/sql-tests/inputs/datetime-corrected.sql

@@ -0,0 +1,2 @@
+--SET spark.sql.legacy.timeParserPolicy=CORRECTED


do we have different test results with CORRECTED mode?

I've just come to understand what it means, I will rm this case.

yaooqinn · 2020-05-21T11:52:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala


  @transient
-  private lazy val formatter = getOrCreateFormatter(pattern, locale)
+  private lazy val formatter = {


Shall we remove this lazy to let it fail fast in the parse phase? @cloud-fan

Hmm, this one and the others are transient, so the lazy keyword is required.

SparkQA · 2020-05-21T12:45:17Z

Test build #122928 has finished for PR 28592 at commit 1144c03.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-05-21T16:13:41Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala

+  private lazy val formatter: DateTimeFormatter = {
+    try {
+      getOrCreateFormatter(pattern, locale)
+    } catch checkLegacyFormatter(pattern, legacyFormatter.format(0))


legacyFormatter.format(0) is hacky... let's add the initialize API

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala

SparkQA · 2020-05-21T18:16:01Z

Test build #122935 has finished for PR 28592 at commit 549a122.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-21T20:09:39Z

Test build #122937 has finished for PR 28592 at commit c877ac5.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2020-05-21T20:32:51Z

Test build #122938 has finished for PR 28592 at commit b2abeeb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-05-22T06:44:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala

  def format(date: Date): String
  def format(localDate: LocalDate): String
+
+  def initialize(): Unit = {}


shall we force all the children to implement initialize?

And maybe a better name is validatePatternString. initialize sounds like it must be called.

and we should call it in TimestampFormatter.apply if the policy is not legacy, to fail earlier.

Got it. SGTM.

cloud-fan · 2020-05-22T06:47:00Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala

+   * IllegalArgumentException will be thrown.
+   *
+   * @param pattern the date time pattern
+   * @param block a func to capture exception, identically which forces a legacy datetime formatter


block is a bad name. How about tryLegacyFormatter?

SparkQA · 2020-05-22T07:05:01Z

Test build #122958 has finished for PR 28592 at commit 1503042.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-22T07:05:02Z

Test build #122963 has finished for PR 28592 at commit 052bfad.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-22T07:05:02Z

Test build #122964 has finished for PR 28592 at commit 8141ef9.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2020-05-22T08:22:29Z

sql/core/src/test/resources/sql-tests/results/datetime.sql.out

+
+
+-- !query
+select from_unixtime(54321, 'QQQQQ')


Due to diff exception handling for IllegalArgumentException at the call sides, the results are not same https://github.com/apache/spark/pull/28592/files#diff-79dd276be45ede6f34e24ad7005b0a7cR801-R806
cc @cloud-fan

it's OK. it's already the case in 2.4

docs/sql-ref-datetime-pattern.md

SparkQA · 2020-05-22T10:07:00Z

Test build #122986 has finished for PR 28592 at commit 09b407f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class SparkListenerResourceProfileAdded(resourceProfile: ResourceProfile)

cloud-fan · 2020-05-22T10:51:50Z

docs/sql-ref-datetime-pattern.md

+|**E**|day-of-week|text|Tue; Tuesday|
+|**u**|localized day-of-week|number/text|2; 02; Tue; Tuesday|
+|**F**|week-of-month|number(1)|3|
+|**a**|am-pm-of-day|am/pm|PM|


nit: am-pm, as it's weird to have / in a name.

cloud-fan · 2020-05-22T10:52:36Z

docs/sql-ref-datetime-pattern.md

+- Text: The text style is determined based on the number of pattern letters used. Less than 4 pattern letters will use the short form. Exactly 4 pattern letters will use the full form. Exactly 5 pattern letters will use the narrow form. 5 or more letters will fail.

- Number: If the count of letters is one, then the value is output using the minimum number of digits and without padding. Otherwise, the count of digits is used as the width of the output field, with the value zero-padded as necessary. The following pattern letters have constraints on the count of letters. Only one letter 'F' can be specified. Up to two letters of 'd', 'H', 'h', 'K', 'k', 'm', and 's' can be specified. Up to three letters of 'D' can be specified.
+- Number(n): the n here represents the maximum count of letters this type of datetime pattern can be used. If the count of letters is one, then the value is output using the minimum number of digits and without padding. Otherwise, the count of digits is used as the width of the output field, with the value zero-padded as necessary.


the -> The

cloud-fan · 2020-05-22T10:53:17Z

docs/sql-ref-datetime-pattern.md

-    J
-    ```
+
+- AM/PM(a): This outputs the am-pm-of-day. Pattern letter count must be 1.


AM/PM(a) -> am-pm

cloud-fan · 2020-05-22T10:56:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala

+        case _: Throwable => throw e
+      }
+      throw new SparkUpgradeException("3.0", s"Fail to recognize '$pattern' pattern in the" +
+        s" new parser. 1) You can set ${SQLConf.LEGACY_TIME_PARSER_POLICY.key} to LEGACY to" +


new parser -> DateTimeFormatter

SparkQA · 2020-05-22T11:23:14Z

Test build #122993 has finished for PR 28592 at commit 0a76ba3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-22T15:55:20Z

Test build #122994 has finished for PR 28592 at commit 5360d88.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-22T16:15:11Z

Test build #122990 has finished for PR 28592 at commit 75fdbcb.

This patch fails from timeout after a configured wait of 400m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-22T17:13:42Z

Test build #122999 has finished for PR 28592 at commit ee1d62a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-22T21:25:13Z

Test build #123010 has finished for PR 28592 at commit 3047f88.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class SecondsToTimestamp(child: Expression)
case class MillisToTimestamp(child: Expression)
case class MicrosToTimestamp(child: Expression)

cloud-fan · 2020-05-25T15:07:38Z

thanks, merging to master!

cloud-fan · 2020-05-25T15:09:14Z

Hi @yaooqinn can you send a new PR for 3.0?

yaooqinn · 2020-05-25T15:12:49Z

OK，thanks for merging

MaxGekk · 2020-05-31T14:54:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala

  private val noRows = None

-  private val timestampFormatter = TimestampFormatter(
+  private lazy val timestampFormatter = TimestampFormatter(


What is the reason to make it lazy?

the formatter creation will validate the pattern string now, but json/csv has a fallback and shouldn't fail because of invalid pattern string.

[SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M…

78fb74a

…/L/E/u/Q/q'

probot-autolabeler bot added DOCS SQL labels May 20, 2020

cloud-fan reviewed May 20, 2020

View reviewed changes

sql/core/src/test/resources/sql-tests/inputs/datetime.sql Show resolved Hide resolved

fail for parser

e178a6b

yaooqinn requested a review from cloud-fan May 21, 2020 02:45

yaooqinn added 2 commits May 21, 2020 18:27

address comments

d5b5a9c

fix test

1d31ed2

fix tests

1144c03

cloud-fan reviewed May 21, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala Show resolved Hide resolved

cloud-fan reviewed May 21, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 21, 2020

View reviewed changes

yaooqinn commented May 21, 2020

View reviewed changes

refine

549a122

cloud-fan reviewed May 21, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala Outdated Show resolved Hide resolved

yaooqinn added 2 commits May 22, 2020 01:20

initialize api

c877ac5

Merge branch 'master' into SPARK-31771

b2abeeb

yaooqinn added 3 commits May 22, 2020 10:01

tests

1503042

more test cases

052bfad

add doc

8141ef9

cloud-fan reviewed May 22, 2020

View reviewed changes

yaooqinn added 2 commits May 22, 2020 15:54

update doc and adress comments

4491e79

Merge branch 'master' into SPARK-31771

09b407f

yaooqinn commented May 22, 2020

View reviewed changes

cloud-fan reviewed May 22, 2020

View reviewed changes

docs/sql-ref-datetime-pattern.md Outdated Show resolved Hide resolved

doc update

75fdbcb

cloud-fan reviewed May 22, 2020

View reviewed changes

fix doc and tests

0a76ba3

cloud-fan reviewed May 22, 2020

View reviewed changes

update tests

5360d88

lazy

ee1d62a

cloud-fan approved these changes May 22, 2020

View reviewed changes

Merge branch 'master' into SPARK-31771

3047f88

cloud-fan closed this in 695cb61 May 25, 2020

MaxGekk reviewed May 31, 2020

View reviewed changes

		@@ -0,0 +1,2 @@
		--SET spark.sql.legacy.timeParserPolicy=CORRECTED

[SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q' #28592

[SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q' #28592

Uh oh!

Conversation

yaooqinn commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yaooqinn commented May 20, 2020

Uh oh!

Uh oh!

SparkQA commented May 20, 2020

Uh oh!

SparkQA commented May 20, 2020

Uh oh!

SparkQA commented May 21, 2020

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn May 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented May 21, 2020

Uh oh!

SparkQA commented May 21, 2020

Uh oh!

SparkQA commented May 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 22, 2020

Uh oh!

SparkQA commented May 22, 2020

Uh oh!

SparkQA commented May 22, 2020

Uh oh!

yaooqinn May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented May 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 22, 2020

Uh oh!

SparkQA commented May 22, 2020

Uh oh!

yaooqinn commented May 20, 2020 •

edited

Loading

yaooqinn May 21, 2020 •

edited

Loading

yaooqinn May 22, 2020 •

edited

Loading

cloud-fan commented May 25, 2020 •

edited

Loading