-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31827][SQL] Fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form #28646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| lazy val bugInStandAloneForm = { | ||
| val formatter = DateTimeFormatter.ofPattern("LLL", Locale.US) | ||
| // JDK 8 has a bug for stand-alone form. See https://bugs.openjdk.java.net/browse/JDK-8114833 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this pointer.
| throw new IllegalArgumentException(s"Too many pattern letters: ${style.head}") | ||
| } | ||
| if (bugInStandAloneForm && (patternPart.contains("LLL") || patternPart.contains("qqq"))) { | ||
| throw new IllegalArgumentException("The current JDK has a bug to support stand-alone " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, JDK -> Java because there is JRE environment, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug in the time library of JDK, and is not related to JRE. I think JDK is more precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by not related to JRE? DateTimeFormatter.ofPattern("LLL", Locale.US) is working correctly on JRE or DateTimeFormatter.ofPattern("LLL", Locale.US) is absent in JRE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we have different interpretation of JDK and JRE.
This is a bug in the java.time library, not in JVM. I'm not sure how to refer to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'Java' is more relevant term here since I guess that this error occurs on java command in both JDK and JRE. Both use Java class library (JCL).
Here are definitions of both terms at https://java.com/en/download/faq/techinfo.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your intention. I wrote the previous one before I read your message.
How about the term Java class library (JCL)?
https://en.wikipedia.org/wiki/Java_Class_Library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JCL is not a widely used word. I picked Java to make it general and easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It is familiar to me (and Kris) :). Got general feeling. No objection.
| } | ||
|
|
||
| final val bugInStandAloneForm = { | ||
| val formatter = DateTimeFormatter.ofPattern("LLL", Locale.US) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause perf-regression for json/csv functions when the user-specified Locale is able to produce the right result with LLL? e.g.
select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'ZH'))
一月
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tradeoff we need to make. If we take locale into consideration, then this can't be a static check. We may need to build a cache.
Since the default locale is US, and LLL/qqq is rarely used, the check seems fine to me even if it can give false negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add some comments about this.
|
Test build #123132 has finished for PR 28646 at commit
|
|
Test build #123133 has finished for PR 28646 at commit
|
| for (style <- unsupportedNarrowTextStyle if patternPart.contains(style)) { | ||
| throw new IllegalArgumentException(s"Too many pattern letters: ${style.head}") | ||
| } | ||
| if (bugInStandAloneForm && (patternPart.contains("LLL") || patternPart.contains("qqq"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You checked only 'LLL' pattern in bugInStandAloneForm() but throws exception for 'qqq' as well. Are you sure they are directly related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the java doc:
Pattern letters 'L', 'c', and 'q' specify the stand-alone form of the text styles.
I think they are directly related. And I tested q locally as well. c is already forbidden in 3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I have double checked the pattern 'qqq'. It has the same problem
JDK 11:
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'RU'));
1-й кв.
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'EN'));
Q1JDK 8
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'RU'));
1
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'EN'));
1| toFormatter(builder, TimestampFormatter.defaultLocale) | ||
| } | ||
|
|
||
| final val bugInStandAloneForm = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no big deal but I would make this as a private.
| // complicate the check here. | ||
| // TODO: remove it when we drop Java 8 support. | ||
| val formatter = DateTimeFormatter.ofPattern("LLL", Locale.US) | ||
| formatter.format(LocalDate.of(2000, 1, 1)) == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we are sure 'LLL' and 'qqq' have the same problems. Can you check 'qqq' here as well?
val formatter = DateTimeFormatter.ofPattern("LLL qqq", Locale.US)
formatter.format(LocalDate.of(2000, 1, 1)) == "1 Q1"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ideal case, we should check 'LLL' and 'qqq' separately, I do believe.
|
Test build #123169 has finished for PR 28646 at commit
|
|
Test build #123187 has finished for PR 28646 at commit
|
|
thanks for the review, merging to master/3.0! |
…a 8 bug of stand-alone form If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message. to keep backward compatibility with Spark 2.4 Yes Spark 2.4 ``` scala> sql("select date_format('1990-1-1', 'LLL')").show +---------------------------------------------+ |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)| +---------------------------------------------+ | Jan| +---------------------------------------------+ ``` Spark 3.0 with Java 11 ``` scala> sql("select date_format('1990-1-1', 'LLL')").show +---------------------------------------------+ |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)| +---------------------------------------------+ | Jan| +---------------------------------------------+ ``` Spark 3.0 with Java 8 ``` // before this PR +---------------------------------------------+ |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)| +---------------------------------------------+ | 1| +---------------------------------------------+ // after this PR scala> sql("select date_format('1990-1-1', 'LLL')").show org.apache.spark.SparkUpgradeException ``` manual test with java 8 and 11 Closes #28646 from cloud-fan/format. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
Thank you all! |
What changes were proposed in this pull request?
If
LLL/qqqis used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message.Why are the changes needed?
to keep backward compatibility with Spark 2.4
Does this PR introduce any user-facing change?
Yes
Spark 2.4
Spark 3.0 with Java 11
Spark 3.0 with Java 8
How was this patch tested?
manual test with java 8 and 11