Skip to content

Conversation

@rubenljanssen
Copy link
Contributor

What changes were proposed in this pull request?

The current implementation of weekofyear implements ISO8601, which results in the following unintuitive behaviour:

weekofyear("2017-01-01") returns 52

In MySQL, this would return 1 (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_weekofyear), although it could return 52 if specified specifically (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_week).

I therefore think instead of only changing the behavior as specified in the JIRA, it would be better to support both. Hence I've added an additional function.

How was this patch tested?

Added some unit tests

@rubenljanssen
Copy link
Contributor Author

Coming to think of it, it might actually be better to switch it around: have ISO8601 as function weekofyear, and make a separate function for gregorian because ISO is more of a commonly used term.

@srowen
Copy link
Member

srowen commented May 24, 2017

I don't think you can just change the behavior. It would possibly break apps and I presume no longer matches Hive. If it already implements a standard too, it sounds like it is correct. A second method seems like API clutter

@rubenljanssen
Copy link
Contributor Author

rubenljanssen commented May 24, 2017

I agree that we shouldn't change the behavior, hence I suggested we could do it the other way around: make a new function for gregorian instead and leave weekofyear as is.

To address the API clutter, I suppose we could define the function as follows: FUNC(date[, gregorian])

override def dataType: DataType = IntegerType

@transient private lazy val minimalDays = {
if ("gregorian".equalsIgnoreCase(format.toString)) 1 else 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many formats the other DB/systems allow? Could you do a search?

Copy link
Contributor Author

@rubenljanssen rubenljanssen May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of research, and there seem to be no other formats. However, some systems (such as MySQL and Java), allow the first day of the week to be defined as well. Some countries in the middle east have weekends on Friday/Saturday, or even Thursday/Friday.
I will update the PR to allow users to override the first day of the week, as well as specify how the first week is defined (1 iso standard: week with more than half of the days, i.e. Thursday in a Monday-Sunday week. 2 gregorian: week with first day of the new year)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will still default to ISO stanards with Monday-Sunday week of course, but now users can override it in any way they would like

@srowen
Copy link
Member

srowen commented May 29, 2017

Is this variant available in any other DB? A lot of the goal of providing built-in functions is compatibility. Beyond that a lot of things are better handled with UDFs for special cases, not new built-ins

@rubenljanssen
Copy link
Contributor Author

rubenljanssen commented May 29, 2017

This variant is available in other DB's, albeit with slightly different function and parameter naming. For example, MySQL allows it via the week() function: http://www.w3resource.com/mysql/date-and-time-functions/mysql-week-function.php

In this case, you pass in an integer that specifies which permutation you want. Please note that if you look at the table, the 'Week 1 is the first week …' column is the difference between gregorian and iso.

In contrast, Oracle allows you to switch between gregorian and iso as follows:
to_char(contact_date,'ww') vs to_char(contact_date,'iw')

Do you reckon it would be better to change the name and align it more properly? Another option would be to simply implement yearweek as done mysql (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_yearweek), which is less flexible but also much simpler

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Nov 8, 2018

I think that if Spark's behavior matches Hive's, that's what we want here. Other variations can be implemented in UDFs, which provide all the flexibility you'd want. These functions exist in all kinds of variations in SQL databases because UDFs are hard or unavailable.

@asfgit asfgit closed this in a3ba3a8 Nov 11, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#21766
Closes apache#21679
Closes apache#21161
Closes apache#20846
Closes apache#19434
Closes apache#18080
Closes apache#17648
Closes apache#17169

Add:
Closes apache#22813
Closes apache#21994
Closes apache#22005
Closes apache#22463

Add:
Closes apache#15899

Add:
Closes apache#22539
Closes apache#21868
Closes apache#21514
Closes apache#21402
Closes apache#21322
Closes apache#21257
Closes apache#20163
Closes apache#19691
Closes apache#18697
Closes apache#18636
Closes apache#17176

Closes apache#23001 from wangyum/CloseStalePRs.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants