Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/sql-ref-datetime-pattern.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ The count of pattern letters determines the format.

- Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present.

- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. In Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard form. Here are examples for all supported pattern letters (more than 4 letters is invalid):
- Month: It follows the rule of Number/Text. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. These two forms are different only in some certain languages. For example, in Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard form. Here are examples for all supported pattern letters:
- `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
```sql
spark-sql> select date_format(date '1970-01-01', "M");
Expand Down Expand Up @@ -107,8 +107,8 @@ The count of pattern letters determines the format.
```
- `'MMMM'`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps.
```sql
spark-sql> select date_format(date '1970-01-01', "MMMM yyyy");
January 1970
spark-sql> select date_format(date '1970-01-01', "d MMMM");
1 January
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd MMMM', 'locale', 'RU'));
1 января
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -217,9 +217,18 @@ private object DateTimeFormatterHelper {
toFormatter(builder, TimestampFormatter.defaultLocale)
}

private final val bugInStandAloneForm = {
// Java 8 has a bug for stand-alone form. See https://bugs.openjdk.java.net/browse/JDK-8114833
// Note: we only check the US locale so that it's a static check. It can produce false-negative
// as some locales are not affected by the bug. Since `L`/`q` is rarely used, we choose to not
// complicate the check here.
// TODO: remove it when we drop Java 8 support.
val formatter = DateTimeFormatter.ofPattern("LLL qqq", Locale.US)
formatter.format(LocalDate.of(2000, 1, 1)) == "1 1"
}
final val unsupportedLetters = Set('A', 'c', 'e', 'n', 'N', 'p')
final val unsupportedNarrowTextStyle =
Set("GGGGG", "MMMMM", "LLLLL", "EEEEE", "uuuuu", "QQQQQ", "qqqqq", "uuuuu")
Seq("G", "M", "L", "E", "u", "Q", "q").map(_ * 5).toSet

/**
* In Spark 3.0, we switch to the Proleptic Gregorian calendar and use DateTimeFormatter for
Expand All @@ -244,6 +253,12 @@ private object DateTimeFormatterHelper {
for (style <- unsupportedNarrowTextStyle if patternPart.contains(style)) {
throw new IllegalArgumentException(s"Too many pattern letters: ${style.head}")
}
if (bugInStandAloneForm && (patternPart.contains("LLL") || patternPart.contains("qqq"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You checked only 'LLL' pattern in bugInStandAloneForm() but throws exception for 'qqq' as well. Are you sure they are directly related?

Copy link
Contributor Author

@cloud-fan cloud-fan May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the java doc:

Pattern letters 'L', 'c', and 'q' specify the stand-alone form of the text styles.

I think they are directly related. And I tested q locally as well. c is already forbidden in 3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I have double checked the pattern 'qqq'. It has the same problem

JDK 11:

spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'RU'));
1-й кв.
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'EN'));
Q1

JDK 8

spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'RU'));
1
spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'qqq', 'locale', 'EN'));
1

throw new IllegalArgumentException("Java 8 has a bug to support stand-alone " +
"form (3 or more 'L' or 'q' in the pattern string). Please use 'M' or 'Q' instead, " +
"or upgrade your Java version. For more details, please read " +
"https://bugs.openjdk.java.net/browse/JDK-8114833")
}
// The meaning of 'u' was day number of week in SimpleDateFormat, it was changed to year
// in DateTimeFormatter. Substitute 'u' to 'e' and use DateTimeFormatter to parse the
// string. If parsable, return the result; otherwise, fall back to 'u', and then use the
Expand Down