[SPARK-28687][SQL] Support epoch, isoyear, milliseconds and microseconds at extract()#25408
[SPARK-28687][SQL] Support epoch, isoyear, milliseconds and microseconds at extract()#25408MaxGekk wants to merge 21 commits intoapache:masterfrom
epoch, isoyear, milliseconds and microseconds at extract()#25408Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
|
Test build #108930 has finished for PR 25408 at commit
|
|
Test build #108936 has finished for PR 25408 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
|
Test build #108954 has finished for PR 25408 at commit
|
There was a problem hiding this comment.
This PR's milliseconds implementation is inconsistent from PostgreSQL. It must be due to a simple misunderstanding on the contracts.
PostgreSQL
postgres=# select extract(milliseconds from timestamp '2011-05-06 07:08:09.1234567');
date_part
-----------
9123.457
(1 row)This PR
spark-sql> select extract(milliseconds from timestamp '2011-05-06 07:08:09.1234567');
9123Could you fix that and double-check all the other function implementations once again by comparing the full sub-second parts? Also, the document should be updated together to avoid this kind of confusions at user-sides.
In addition, PostgreSQL accepts more patterns like MSEC. Since this is for PostgreSQL Feature Parity, we had better support all patterns.
postgres=# SELECT EXTRACT(MSEC FROM TIMESTAMP '2011-05-06 07:08:09.1234567');
date_part
-----------
9123.457
(1 row)
|
Test build #108956 has finished for PR 25408 at commit
|
|
@dongjoon-hyun Just in case, you checked not only behavior of spark-sql> select extract(milliseconds from timestamp '2011-05-06 07:08:09.1234567');
9123.456 |
|
Thanks, @MaxGekk . BTW, please don't forget the other part like |
|
Test build #108990 has finished for PR 25408 at commit
|
|
Test build #109017 has finished for PR 25408 at commit
|
|
jenkins, retest this, please |
|
Test build #109024 has finished for PR 25408 at commit
|
|
jenkins, retest this, please |
|
Test build #109033 has finished for PR 25408 at commit
|
There was a problem hiding this comment.
Ur, @MaxGekk . Did you check PostgreSQL to find out all supported variants?
Please check the variants really instead of relying on my comments. 😅
There was a problem hiding this comment.
Did you check PostgreSQL to find out all supported variants?
I checked PostgreSQL documentation and JIRA ticket that mention milliseconds only. As far as I see timestamp.sql uses msec / msec as well.
at least, MS and MSECS are missing here.
Do we need to support them if they are not used in tests?
There was a problem hiding this comment.
Please check the variants really instead of relying on my comments. 😅
I relied on existing test
spark/sql/core/src/test/resources/sql-tests/inputs/pgSQL/timestamp.sql
Lines 198 to 199 in caa23e3
There was a problem hiding this comment.
I've checked the related code in pgsql and it seems to have many variants:
- millisecon
- ms
- msec
- mseconds
- msecs
- msecond
The implementation of extract is timestamp_part in pgsql: https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/timestamp.c#L4507
This function handles DTK_MILLISEC for milleseconds: https://github.com/postgres/postgres/blob/fded4773eb60541c6e7dbcf09c9bcb1cd36a063b/src/backend/utils/adt/timestamp.c#L4552
DTK_MILLISEC is expanded into many variants above:
https://github.com/postgres/postgres/blob/fded4773eb60541c6e7dbcf09c9bcb1cd36a063b/src/interfaces/ecpg/pgtypeslib/dt_common.c#L446
https://github.com/postgres/postgres/blob/fded4773eb60541c6e7dbcf09c9bcb1cd36a063b/src/include/utils/datetime.h#L49
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
|
@dongjoon-hyun It seems there are many synonyms for the same units in PostgreSQL: https://github.com/postgres/postgres/blob/64579be64a3811d08ea7ccf92b1a443d76b96412/src/backend/utils/adt/datetime.c#L171-L234 . Are you sure we must support all of them? |
|
Test build #109051 has finished for PR 25408 at commit
|
|
Could you rebase this to the master please, @MaxGekk ? |
|
Test build #109091 has finished for PR 25408 at commit
|
There was a problem hiding this comment.
+1, LGTM. Thank you, @MaxGekk , @maropu , @HyukjinKwon .
Merged to master!
What changes were proposed in this pull request?
In the PR, I propose new expressions
Epoch,IsoYear,MillisecondsandMicroseconds, and support additional parameters ofextract()for feature parity with PostgreSQL (https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT):epoch- the number of seconds since 1970-01-01 00:00:00 local time in microsecond precision.isoyear- the ISO 8601 week-numbering year that the date falls in. Each ISO 8601 week-numbering year begins with the Monday of the week containing the 4th of January.milliseconds- the seconds field including fractional parts multiplied by 1,000.microseconds- the seconds field including fractional parts multiplied by 1,000,000.Here are examples:
How was this patch tested?
Added new tests to
DateExpressionsSuite, and uncommented existing tests inextract.sqlandpgSQL/date.sql.