diff --git a/docs/sql-ref-datetime-pattern.md b/docs/sql-ref-datetime-pattern.md index 4275f03335b33..48e85b450e6b2 100644 --- a/docs/sql-ref-datetime-pattern.md +++ b/docs/sql-ref-datetime-pattern.md @@ -76,7 +76,7 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. -- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. In Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard form. Here are examples for all supported pattern letters (more than 4 letters is invalid): +- Month: It follows the rule of Number/Text. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. These two forms are different only in some certain languages. For example, in Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard form. Here are examples for all supported pattern letters: - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. ```sql spark-sql> select date_format(date '1970-01-01', "M"); @@ -107,8 +107,8 @@ The count of pattern letters determines the format. ``` - `'MMMM'`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. ```sql - spark-sql> select date_format(date '1970-01-01', "MMMM yyyy"); - January 1970 + spark-sql> select date_format(date '1970-01-01', "d MMMM"); + 1 January spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd MMMM', 'locale', 'RU')); 1 января ``` diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala index 0ea54c28cb285..353c074caa75e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala @@ -217,9 +217,18 @@ private object DateTimeFormatterHelper { toFormatter(builder, TimestampFormatter.defaultLocale) } + private final val bugInStandAloneForm = { + // Java 8 has a bug for stand-alone form. See https://bugs.openjdk.java.net/browse/JDK-8114833 + // Note: we only check the US locale so that it's a static check. It can produce false-negative + // as some locales are not affected by the bug. Since `L`/`q` is rarely used, we choose to not + // complicate the check here. + // TODO: remove it when we drop Java 8 support. + val formatter = DateTimeFormatter.ofPattern("LLL qqq", Locale.US) + formatter.format(LocalDate.of(2000, 1, 1)) == "1 1" + } final val unsupportedLetters = Set('A', 'c', 'e', 'n', 'N', 'p') final val unsupportedNarrowTextStyle = - Set("GGGGG", "MMMMM", "LLLLL", "EEEEE", "uuuuu", "QQQQQ", "qqqqq", "uuuuu") + Seq("G", "M", "L", "E", "u", "Q", "q").map(_ * 5).toSet /** * In Spark 3.0, we switch to the Proleptic Gregorian calendar and use DateTimeFormatter for @@ -244,6 +253,12 @@ private object DateTimeFormatterHelper { for (style <- unsupportedNarrowTextStyle if patternPart.contains(style)) { throw new IllegalArgumentException(s"Too many pattern letters: ${style.head}") } + if (bugInStandAloneForm && (patternPart.contains("LLL") || patternPart.contains("qqq"))) { + throw new IllegalArgumentException("Java 8 has a bug to support stand-alone " + + "form (3 or more 'L' or 'q' in the pattern string). Please use 'M' or 'Q' instead, " + + "or upgrade your Java version. For more details, please read " + + "https://bugs.openjdk.java.net/browse/JDK-8114833") + } // The meaning of 'u' was day number of week in SimpleDateFormat, it was changed to year // in DateTimeFormatter. Substitute 'u' to 'e' and use DateTimeFormatter to parse the // string. If parsable, return the result; otherwise, fall back to 'u', and then use the