Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

If LLL/qqq is used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message.

Why are the changes needed?

to keep backward compatibility with Spark 2.4

Does this PR introduce any user-facing change?

Yes

Spark 2.4

scala> sql("select date_format('1990-1-1', 'LLL')").show
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
|                                          Jan|
+---------------------------------------------+

Spark 3.0 with Java 11

scala> sql("select date_format('1990-1-1', 'LLL')").show
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
|                                          Jan|
+---------------------------------------------+

Spark 3.0 with Java 8

// before this PR
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
|                                            1|
+---------------------------------------------+
// after this PR
scala> sql("select date_format('1990-1-1', 'LLL')").show
org.apache.spark.SparkUpgradeException

How was this patch tested?

manual test with java 8 and 11

@cloud-fan
Copy link
Contributor Author

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31827][SQL] better error message for the JDK bug of stand-alone form [SPARK-31827][SQL] better error message for the JDK8 bug of stand-alone form May 26, 2020

lazy val bugInStandAloneForm = {
val formatter = DateTimeFormatter.ofPattern("LLL", Locale.US)
// JDK 8 has a bug for stand-alone form. See https://bugs.openjdk.java.net/browse/JDK-8114833
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this pointer.

throw new IllegalArgumentException(s"Too many pattern letters: ${style.head}")
}
if (bugInStandAloneForm && (patternPart.contains("LLL") || patternPart.contains("qqq"))) {
throw new IllegalArgumentException("The current JDK has a bug to support stand-alone " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, JDK -> Java because there is JRE environment, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in the time library of JDK, and is not related to JRE. I think JDK is more precise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by not related to JRE? DateTimeFormatter.ofPattern("LLL", Locale.US) is working correctly on JRE or DateTimeFormatter.ofPattern("LLL", Locale.US) is absent in JRE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we have different interpretation of JDK and JRE.

This is a bug in the java.time library, not in JVM. I'm not sure how to refer to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'Java' is more relevant term here since I guess that this error occurs on java command in both JDK and JRE. Both use Java class library (JCL).
Here are definitions of both terms at https://java.com/en/download/faq/techinfo.xml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your intention. I wrote the previous one before I read your message.

How about the term Java class library (JCL)?
https://en.wikipedia.org/wiki/Java_Class_Library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JCL is not a widely used word. I picked Java to make it general and easy to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It is familiar to me (and Kris) :). Got general feeling. No objection.

}

final val bugInStandAloneForm = {
val formatter = DateTimeFormatter.ofPattern("LLL", Locale.US)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause perf-regression for json/csv functions when the user-specified Locale is able to produce the right result with LLL? e.g.

select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'ZH'))
一月

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tradeoff we need to make. If we take locale into consideration, then this can't be a static check. We may need to build a cache.

Since the default locale is US, and LLL/qqq is rarely used, the check seems fine to me even if it can give false negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add some comments about this.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123132 has finished for PR 28646 at commit 9305d3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123133 has finished for PR 28646 at commit 8cbff72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

for (style <- unsupportedNarrowTextStyle if patternPart.contains(style)) {
throw new IllegalArgumentException(s"Too many pattern letters: ${style.head}")
}
if (bugInStandAloneForm && (patternPart.contains("LLL") || patternPart.contains("qqq"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You checked only 'LLL' pattern in bugInStandAloneForm() but throws exception for 'qqq' as well. Are you sure they are directly related?

Copy link
Contributor Author

@cloud-fan cloud-fan May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the java doc:

Pattern letters 'L', 'c', and 'q' specify the stand-alone form of the text styles.

I think they are directly related. And I tested q locally as well. c is already forbidden in 3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I have double checked the pattern 'qqq'. It has the same problem

JDK 11:

spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'RU'));
1-й кв.
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'EN'));
Q1

JDK 8

spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'RU'));
1
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'EN'));
1

toFormatter(builder, TimestampFormatter.defaultLocale)
}

final val bugInStandAloneForm = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no big deal but I would make this as a private.

// complicate the check here.
// TODO: remove it when we drop Java 8 support.
val formatter = DateTimeFormatter.ofPattern("LLL", Locale.US)
formatter.format(LocalDate.of(2000, 1, 1)) == "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are sure 'LLL' and 'qqq' have the same problems. Can you check 'qqq' here as well?

    val formatter = DateTimeFormatter.ofPattern("LLL qqq", Locale.US)
    formatter.format(LocalDate.of(2000, 1, 1)) == "1 Q1"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ideal case, we should check 'LLL' and 'qqq' separately, I do believe.

@SparkQA
Copy link

SparkQA commented May 27, 2020

Test build #123169 has finished for PR 28646 at commit fcb3e43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2020

Test build #123187 has finished for PR 28646 at commit b7cbc8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-31827][SQL] better error message for the JDK8 bug of stand-alone form [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form May 27, 2020
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/3.0!

@cloud-fan cloud-fan closed this in 1528fbc May 27, 2020
cloud-fan added a commit that referenced this pull request May 27, 2020
…a 8 bug of stand-alone form

If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message.

to keep backward compatibility with Spark 2.4

Yes

Spark 2.4
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
|                                          Jan|
+---------------------------------------------+
```

Spark 3.0 with Java 11
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
|                                          Jan|
+---------------------------------------------+
```

Spark 3.0 with Java 8
```
// before this PR
+---------------------------------------------+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+---------------------------------------------+
|                                            1|
+---------------------------------------------+
// after this PR
scala> sql("select date_format('1990-1-1', 'LLL')").show
org.apache.spark.SparkUpgradeException
```

manual test with java 8 and 11

Closes #28646 from cloud-fan/format.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you all!

@HyukjinKwon HyukjinKwon changed the title [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form [SPARK-31827][SQL] Fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants