Skip to content

Conversation

@chongguang
Copy link
Contributor

@chongguang chongguang commented Dec 14, 2020

What changes were proposed in this pull request?

The proposition of this pull request is described in this JIRA ticket: https://issues.apache.org/jira/browse/SPARK-33769

It proposes to improve the next-day function of the sql component to deal with Column type for the parameter dayOfWeek.

Why are the changes needed?

It makes this functionality easier to use.
Actually the signature of this function is:

def next_day(date: Column, dayOfWeek: String): Column.

It accepts the dayOfWeek parameter as a String. However in some cases, the dayOfWeek is in a Column, so a different value for each row of the dataframe.
A current workaround is to use the NextDay function like this:

NextDay(dateCol.expr, dayOfWeekCol.expr).

The proposition is to add another signature for this function:

def next_day(date: Column, dayOfWeek: Column): Column

In fact it is already the case for some other functions in this scala object, exemple:

def date_sub(start: Column, days: Int): Column = date_sub(start, lit(days))
def date_sub(start: Column, days: Column): Column = withExpr { DateSub(start.expr, days.expr) }

or

def add_months(startDate: Column, numMonths: Int): Column = add_months(startDate, lit(numMonths))
def add_months(startDate: Column, numMonths: Column): Column = withExpr {
AddMonths(startDate.expr, numMonths.expr)
}

This pull request is the same idea for the function next_day.

Does this PR introduce any user-facing change?

Yes
With this pull request, users of spark will have a new signature of the function:

def next_day(date: Column, dayOfWeek: Column): Column

But the existing function signature should still work:

def next_day(date: Column, dayOfWeek: String): Column

So this change should be retrocompatible.

How was this patch tested?

The unit tests of the next_day function has been enhanced.
It tests the dayOfWeek parameter both as String and Column.
I also added a test case for the existing signature where the dayOfWeek is a non valid String. This should return null.

correct unit test of the next-day function

add unit test of the next-day function
@chongguang chongguang marked this pull request as ready for review December 14, 2020 12:25
@chongguang chongguang changed the title [WIP][SPARK-33769][SQL]improve the next-day function of the sql component to deal with Column type [SPARK-33769][SQL]improve the next-day function of the sql component to deal with Column type Dec 14, 2020
@github-actions github-actions bot added the SQL label Dec 14, 2020
@chongguang
Copy link
Contributor Author

cc @HyukjinKwon

@HyukjinKwon HyukjinKwon changed the title [SPARK-33769][SQL]improve the next-day function of the sql component to deal with Column type [SPARK-33769][SQL] Improve the next-day function of the sql component to deal with Column type Dec 15, 2020
* "Wed", "Thu", "Fri", "Sat", "Sun"
* @return A date, or null if `date` was a string that could not be cast to a date or if
* `dayOfWeek` was an invalid value
* @group datetime_funcs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add @since 3.2.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37387/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37387/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Test build #132785 has finished for PR 30761 at commit 52f4213.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37416/

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37416/

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Dec 15, 2020

Test build #132815 has finished for PR 30761 at commit 7c32905.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants