-
Notifications
You must be signed in to change notification settings - Fork 29k
[Spark-20771][SQL] Make weekofyear more intuitive #18080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Coming to think of it, it might actually be better to switch it around: have ISO8601 as function weekofyear, and make a separate function for gregorian because ISO is more of a commonly used term. |
|
I don't think you can just change the behavior. It would possibly break apps and I presume no longer matches Hive. If it already implements a standard too, it sounds like it is correct. A second method seems like API clutter |
|
I agree that we shouldn't change the behavior, hence I suggested we could do it the other way around: make a new function for gregorian instead and leave weekofyear as is. To address the API clutter, I suppose we could define the function as follows: FUNC(date[, gregorian]) |
| override def dataType: DataType = IntegerType | ||
|
|
||
| @transient private lazy val minimalDays = { | ||
| if ("gregorian".equalsIgnoreCase(format.toString)) 1 else 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many formats the other DB/systems allow? Could you do a search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit of research, and there seem to be no other formats. However, some systems (such as MySQL and Java), allow the first day of the week to be defined as well. Some countries in the middle east have weekends on Friday/Saturday, or even Thursday/Friday.
I will update the PR to allow users to override the first day of the week, as well as specify how the first week is defined (1 iso standard: week with more than half of the days, i.e. Thursday in a Monday-Sunday week. 2 gregorian: week with first day of the new year)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will still default to ISO stanards with Monday-Sunday week of course, but now users can override it in any way they would like
|
Is this variant available in any other DB? A lot of the goal of providing built-in functions is compatibility. Beyond that a lot of things are better handled with UDFs for special cases, not new built-ins |
|
This variant is available in other DB's, albeit with slightly different function and parameter naming. For example, MySQL allows it via the In this case, you pass in an integer that specifies which permutation you want. Please note that if you look at the table, the 'Week 1 is the first week …' column is the difference between gregorian and iso. In contrast, Oracle allows you to switch between gregorian and iso as follows: Do you reckon it would be better to change the name and align it more properly? Another option would be to simply implement yearweek as done mysql (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_yearweek), which is less flexible but also much simpler |
|
Can one of the admins verify this patch? |
|
I think that if Spark's behavior matches Hive's, that's what we want here. Other variations can be implemented in UDFs, which provide all the flexibility you'd want. These functions exist in all kinds of variations in SQL databases because UDFs are hard or unavailable. |
Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
What changes were proposed in this pull request?
The current implementation of weekofyear implements ISO8601, which results in the following unintuitive behaviour:
weekofyear("2017-01-01") returns 52
In MySQL, this would return 1 (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_weekofyear), although it could return 52 if specified specifically (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_week).
I therefore think instead of only changing the behavior as specified in the JIRA, it would be better to support both. Hence I've added an additional function.
How was this patch tested?
Added some unit tests