-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-38195][SQL] Add the TIMESTAMPADD() function
#35502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d40d93c
2b2653a
08e9edd
d9416b1
a81441d
c5dc75f
15a4f4a
e7b7049
8e38738
d840317
2cde0c2
414d68a
b3c59df
aa1b064
02bf01c
15a888d
02ebffa
a1d3e1d
9c7b2f1
4d4413e
c6e437a
ace083f
bb8cb4a
a81b7de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4510,4 +4510,15 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg | |
| private def alterViewTypeMismatchHint: Option[String] = Some("Please use ALTER TABLE instead.") | ||
|
|
||
| private def alterTableTypeMismatchHint: Option[String] = Some("Please use ALTER VIEW instead.") | ||
|
|
||
| /** | ||
| * Create a TimestampAdd expression. | ||
| */ | ||
| override def visitTimestampadd(ctx: TimestampaddContext): Expression = withOrigin(ctx) { | ||
| val arguments = Seq( | ||
| Literal(ctx.unit.getText), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MaxGekk I think this indicates the unit parameter must be foldable?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not the single enter point for
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I see, then I think the current implementation is fine. |
||
| expression(ctx.unitsAmount), | ||
| expression(ctx.timestamp)) | ||
| UnresolvedFunction("timestampadd", arguments, isDistinct = false) | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1163,4 +1163,40 @@ object DateTimeUtils { | |
| val localStartTs = getLocalDateTime(startMicros, zoneId) | ||
| ChronoUnit.MICROS.between(localStartTs, localEndTs) | ||
| } | ||
|
|
||
| /** | ||
| * Adds the specified number of units to a timestamp. | ||
| * | ||
| * @param unit A keyword that specifies the interval units to add to the input timestamp. | ||
| * @param quantity The amount of `unit`s to add. It can be positive or negative. | ||
| * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z. | ||
| * @param zoneId The time zone ID at which the operation is performed. | ||
| * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z. | ||
| */ | ||
| def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = { | ||
| unit.toUpperCase(Locale.ROOT) match { | ||
| case "MICROSECOND" => | ||
| timestampAddDayTime(micros, quantity, zoneId) | ||
| case "MILLISECOND" => | ||
| timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId) | ||
| case "SECOND" => | ||
| timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId) | ||
| case "MINUTE" => | ||
| timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId) | ||
| case "HOUR" => | ||
| timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId) | ||
| case "DAY" | "DAYOFYEAR" => | ||
| timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId) | ||
| case "WEEK" => | ||
| timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId) | ||
| case "MONTH" => | ||
| timestampAddMonths(micros, quantity, zoneId) | ||
| case "QUARTER" => | ||
| timestampAddMonths(micros, quantity * 3, zoneId) | ||
| case "YEAR" => | ||
| timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId) | ||
| case _ => | ||
| throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we check the unit names in the parser? To fail earlier.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can but I just wonder what kind of problem would the earlier check solve? Parser and compiler do a lot of work for now, adding more unnecessary things should be motivated somehow, from my point of view.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Failing earlier is a pretty strong reason, right? It's a waste of resource if we submit a Spark job which fails with wrong unit name.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not sure. Can you imagine a cluster of 1000 executors waiting for the driver that is still processing a query because we eagerly want to check everything even when user's queries and data don't have any issues. This is real waste of user's resources.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also I would like to add, what you are taking is about a mistake in a query actually, like: SELECT timestampadd(YEER, 1, timestampColumn);Such kind of mistakes are not permanent, and usually users fix them during the debug stage. There are no so much reasons to double check such mistakes at parsing and in runtime (we must do that since
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As summary, taking into account that we will optimize foldable
The expressions require one of their param (format, field and etc) to be always foldable. In the case, of select timestampadd(SECOND, tbl.quantity, tbl.ts1) , and we wants to bump precision of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm totally OK with your decision if the unit parameter can be unfoldable, but it seems not the case? We even added a special parse rule for this function so that the unit parameter is an identifier.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It can be an identifier or a string column, see: spark-sql> create table tbl as select 'SECOND' as u, 1 as q, timestamp'now' as t;
spark-sql> select * from tbl;
SECOND 1 2022-02-18 18:33:34.939
spark-sql> select timestampadd(tbl.u, q, t) from tbl;
2022-02-18 18:33:35.939or spark-sql> select timestampadd('HOUR', 1, timestamp'now');
2022-02-18 19:38:54.817There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically speaking the first argument, unit, should be a datetime interval type as in what's used with EXTRACT. It isn't meant to be a string if that makes things any simpler.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I didn't get your point. How it could be the interval type?
Just wonder why do you linked
This PR achieve this goal, I believe. It makes the migration process to Spark SQL simpler, and gives additional benefits of using Spark SQL in the real production (see my comment above). |
||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -142,3 +142,9 @@ select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE'); | |
| select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE'); | ||
| select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | ||
| select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | ||
|
|
||
| -- Add a number of units to a timestamp or a date | ||
| select timestampadd('MONTH', -1, timestamp'2022-02-14 01:02:03'); | ||
| select timestampadd(MINUTE, 58, timestamp'2022-02-14 01:02:03'); | ||
| select timestampadd(YEAR, 1, date'2022-02-15'); | ||
| select timestampadd('SECOND', -1, date'2022-02-15'); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we have some negative tests? e.g. invalid unit name, overflow, etc.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test for invalid unit name is in this PR, see the test for error class.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding to overflow, actually I reused methods of adding ANSI intervals to timestamps. I think we should test overflow for both ANSI intervals and for timestampadd(). I will open an JIRA for that. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add DATEADD as an alias for the same function in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss overloading of
DATEADDin a separate JIRA. This is arguable, and need to reach a consensus, from my point of view.