-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29520][SS] Fix checks of negative intervals #26177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
a2d83b5
f880fd3
1f5d684
ab6036c
6afa006
084c8d5
2aa6480
1840a1d
4dab906
b212976
8cb55a7
8c2bcb1
edad25e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,8 +29,7 @@ object EventTimeWatermark { | |
|
|
||
| def getDelayMs(delay: CalendarInterval): Long = { | ||
| // We define month as `31 days` to simplify calculation. | ||
|
||
| val millisPerMonth = TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31 | ||
| delay.milliseconds + delay.months * millisPerMonth | ||
| delay.getDuration(31, TimeUnit.MILLISECONDS) | ||
| } | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,7 +31,7 @@ private object Triggers { | |
|
|
||
| def convert(interval: String): Long = { | ||
| val cal = CalendarInterval.fromCaseInsensitiveString(interval) | ||
| if (cal.months > 0) { | ||
| if (cal.months != 0) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like another way of converting interval to duration: make sure the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can change def getDuration(
interval: CalendarInterval,
targetUnit: TimeUnit,
daysPerMonth: Option[Int] = Some(31)): Long = {
val monthsDuration = daysPerMonth
.map { days =>
Math.multiplyExact(days * DateTimeUtils.MICROS_PER_DAY, interval.months)
}.getOrElse {
if (interval.months == 0) {
0L
} else {
throw new IllegalArgumentException(s"Doesn't support month or year interval: $interval")
}
}
val result = Math.addExact(interval.microseconds, monthsDuration)
targetUnit.convert(result, TimeUnit.MICROSECONDS)
}and call
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but I am not sure that this check should be inside of |
||
| throw new IllegalArgumentException(s"Doesn't support month or year interval: $interval") | ||
| } | ||
| TimeUnit.MICROSECONDS.toMillis(cal.microseconds) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to expose this vs just letting callers get the duration and compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it for code readability at caller side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it could be considered as a companion method for negate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Do you think it is not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm neutral on it. It makes some sense; I'm just always reluctant to add API methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can.
Probably, yes.
Because number of days per month is not constant. Structure Streaming follows conservative approach, and assumes 31 days per months (see #16304 (comment)). In another places, we can assume another numbers (see
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
Line 31 in 77fe8a8
For now, the methods are used by SS only. Apparently 31 can be hard coded inside.
The SQL standard allows negative intervals. If we look at other DMBS, PostgreSQL allows as well. If you subtract 2 timestamp columns
timestamp1 - timestamp2, what are you going to produce whentimestamp1 < timstamp2?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, yes makes sense about negative intervals as the necessary result of subtracting timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
CalendarIntervalclass has many static methods that can be move out of it to a separate class, for instanceCalendarIntervalUtils. I would leave onlymilliseconds(),add(),subtract(),negate(),hashCode(),toString()inCalendarInterval, and putfromString(),fromCaseInsensitiveString(),fromYearMonthString()and etc. toCalendarIntervalUtils.... or any kind of refactoring is impossible in Spark already because of commit history tracking problem? @cloud-fan WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't bother with a big refactoring here. Whatever is consistent. But yeah you can make it static at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move those 2 methods to
IntervalUtils