[SPARK-28432][SQL] Add make_date function#25210
[SPARK-28432][SQL] Add make_date function#25210MaxGekk wants to merge 19 commits intoapache:masterfrom
make_date function#25210Conversation
|
Test build #107941 has finished for PR 25210 at commit
|
|
Thank you for working on this, @MaxGekk . cc @wangyum |
|
The R failure seems to be CRAN check failure (which is irrelevant to this PR) again, but let's rerun this to confirm. |
|
Retest this please. |
make_date function
I think it is relevant. Need to describe parameters. |
|
@felixcheung @HyukjinKwon Could you take a look at the PR, especially on R related changes, please. |
|
Test build #107943 has finished for PR 25210 at commit
|
I will add a test but R tests crash with segmentation fault on my laptop with Mac OS. |
|
Test build #107945 has finished for PR 25210 at commit
|
|
@MaxGekk . Got it. Is the following fixed at the latest commit? |
|
BTW, I'm wondering if we need to add this function for all. For hyperbolic math function, we only add them in SQL later because it's PostgreSQL compatibility effort. |
|
Test build #107946 has finished for PR 25210 at commit
|
I see. I will remove the function from Scala, Python, R API. |
|
Test build #107950 has finished for PR 25210 at commit
|
|
All test passed and only CRAN check fails. The failure is irrelevant to this PR. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
| -- !query 48 schema | ||
| struct<make_date(-44, 3, 15):date> | ||
| -- !query 48 output | ||
| 0045-03-15 |
There was a problem hiding this comment.
The year -44 is out of valid range according to SQL standard. We are getting 45 instead of -44 while converting to java.sql.Date. If you switch to Java 8 API for date/timestamps:
scala> spark.conf.set("spark.sql.datetime.java8API.enabled", true)
scala> spark.sql("select make_date(-44, 3, 15)").collect
res7: Array[org.apache.spark.sql.Row] = Array([-0044-03-15])the returned instance of java.time.LocalDate seems reasonable.
There was a problem hiding this comment.
You need to file a JIRA issue for this difference on make_date input range checking. Also, please add the JIRA id to date.sql file.
FYI, the following is PostgreSQL output.
There was a problem hiding this comment.
Actually, the reason for the difference here is how Spark converts internal type DateType to external one java.sql.Date (by default) or java.sql.LocalDate (when spark.sql.datetime.java8API.enabled is set to true). And how the external type converted to string. For example, to get the same format as PostgreSQL, need to provide the appropriate formatter.
For java.sql.Date:
spark.conf.set("spark.sql.datetime.java8API.enabled", false)
val date = spark.sql("select make_date(-44, 3, 15)").first.getAs[java.sql.Date](0)
val sdf = new java.text.SimpleDateFormat("MM-dd-yyyy G")
scala> sdf.format(date)
res18: String = 03-17-0045 BCFor Java8 java.time.LocalDate:
spark.conf.set("spark.sql.datetime.java8API.enabled", true)
val formatter = DateTimeFormatter.ofPattern("MM-dd-yyyy G")
val localDate = spark.sql("select make_date(-44, 3, 15)").first.getAs[LocalDate](0)
localDate.format(formatter)
res16: String = 03-15-0045 BCThe difference in days due to difference calendars (Julian in the first case, and Proleptic Gregorian in the second one).
I see Postgres formats the -44 year as 44 BC which is wrong according to ISO 8601. See https://en.wikipedia.org/wiki/Year_zero , for example:
The "basic" format for year 0 ... year 1 BC. ... hence -0001 = 2 BC.
I don't think we should implement Postgres bugs.
There was a problem hiding this comment.
No, what I mean is we need a JIRA report and JIRA ID comment because we don't follow PostgreSQL like that.
There was a problem hiding this comment.
Ideally, you can file a PostgreSQL bug and use that instead of SPARK JIRA.
In both ways, we should report the difference.
There was a problem hiding this comment.
Here is the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-28471
|
Test build #107961 has finished for PR 25210 at commit
|
|
Test build #107967 has finished for PR 25210 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
|
cc @ueshin |
|
@MaxGekk Could you also improve the PR description with the reasons why we need this new function? |
|
Test build #107999 has finished for PR 25210 at commit
|
|
Test build #108003 has finished for PR 25210 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Merged to master.
Thank you so much, @MaxGekk , @felixcheung , @HyukjinKwon , @gatorsmile , @ueshin !
What changes were proposed in this pull request?
New function
make_date()takes 3 columnsyear,monthandday, and makes new column of theDATEtype. If values in the input columns arenullor out of valid ranges, the function returnsnull. Valid ranges are:year-[1, 9999]month-[1, 12]day-[1, 31]Also constructed date must be valid otherwise
make_datereturnsnull.The function is implemented similarly to
make_datein PostgreSQL: https://www.postgresql.org/docs/11/functions-datetime.html to maintain feature parity with it.Here is an example:
How was this patch tested?
Added new tests to
DateExpressionsSuite.