-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15022][SPARK-15023][SQL][Streaming] Add support for testing against the ProcessingTime(intervalMS > 0) trigger and ManualClock
#12797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
90ed692
9d80b15
b63653a
bc89962
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,8 +65,13 @@ case class ProcessingTimeExecutor(processingTime: ProcessingTime, clock: Clock = | |
| s"${intervalMs} milliseconds, but spent ${realElapsedTimeMs} milliseconds") | ||
| } | ||
|
|
||
| /** Return the next multiple of intervalMs */ | ||
| /** | ||
| * Returns the start time in milliseconds for the next batch interval, given the current time. | ||
| * Note that a batch interval is inclusive with respect to its start time, and thus calling | ||
| * `nextBatchTime` with the result of a previous call should return the next interval. (i.e. given | ||
| * an interval of `100 ms`, `nextBatchTime(nextBatchTime(0)) = 200` rather than `0`). | ||
| */ | ||
| def nextBatchTime(now: Long): Long = { | ||
| (now - 1) / intervalMs * intervalMs + intervalMs | ||
| now / intervalMs * intervalMs + intervalMs | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I wrote this method, I was trying to deal with one case: If a batch takes exactly However, I forgot to handle the case that a batch takes 0ms. How about changing this line to: if (batchElapsedTimeMs == 0) {
clock.waitTillTime(intervalMs)
} else {
clock.waitTillTime(nextBatchTime(batchEndTimeMs))
}
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zsxwing thanks for clarifying on this! :-) [1] if (batchElapsedTimeMs == 0 && batchEndTimeMs % intervalMS == 0) {
clock.waitTillTime(batchEndTimeMs + intervalMs)
} else {
clock.waitTillTime(nextBatchTime(batchEndTimeMs))
}For me It seems a little hard to interpret... [2]
This is a good point! I've done some calculations based on your comments, and it seems we would still run the next batch at once when the last job takes exactly prior to this path: after this patch, it's still the same: @zsxwing given the above [1] and [2], maybe we should simply change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I think your approach is better. Thanks for your clarifying. |
||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,10 +19,10 @@ package org.apache.spark.sql.streaming | |
|
|
||
| import org.apache.spark.sql._ | ||
| import org.apache.spark.sql.execution.streaming._ | ||
| import org.apache.spark.sql.functions._ | ||
| import org.apache.spark.sql.sources.StreamSourceProvider | ||
| import org.apache.spark.sql.test.SharedSQLContext | ||
| import org.apache.spark.sql.types.{IntegerType, StructField, StructType} | ||
| import org.apache.spark.util.ManualClock | ||
|
|
||
| class StreamSuite extends StreamTest with SharedSQLContext { | ||
|
|
||
|
|
@@ -34,11 +34,11 @@ class StreamSuite extends StreamTest with SharedSQLContext { | |
|
|
||
| testStream(mapped)( | ||
| AddData(inputData, 1, 2, 3), | ||
| StartStream, | ||
| StartStream(), | ||
| CheckAnswer(2, 3, 4), | ||
| StopStream, | ||
| AddData(inputData, 4, 5, 6), | ||
| StartStream, | ||
| StartStream(), | ||
| CheckAnswer(2, 3, 4, 5, 6, 7)) | ||
| } | ||
|
|
||
|
|
@@ -70,7 +70,7 @@ class StreamSuite extends StreamTest with SharedSQLContext { | |
| CheckAnswer(1, 2, 3, 4, 5, 6), | ||
| StopStream, | ||
| AddData(inputData1, 7), | ||
| StartStream, | ||
| StartStream(), | ||
| AddData(inputData2, 8), | ||
| CheckAnswer(1, 2, 3, 4, 5, 6, 7, 8)) | ||
| } | ||
|
|
@@ -136,6 +136,22 @@ class StreamSuite extends StreamTest with SharedSQLContext { | |
| testStream(ds)() | ||
| } | ||
| } | ||
|
|
||
| // This would fail for now -- error is "Timed out waiting for stream" | ||
| // Root cause is that data generated in batch 0 may not get processed in batch 1 | ||
| // Let's enable this after SPARK-14942: Reduce delay between batch construction and execution | ||
| ignore("minimize delay between batch construction and execution") { | ||
| val inputData = MemoryStream[Int] | ||
| testStream(inputData.toDS())( | ||
| StartStream(ProcessingTime("10 seconds"), new ManualClock), | ||
| /* -- batch 0 ----------------------- */ | ||
| AddData(inputData, 1), | ||
| AddData(inputData, 2), | ||
| AddData(inputData, 3), | ||
| AdvanceManualClock(10 * 1000), // 10 seconds | ||
| /* -- batch 1 ----------------------- */ | ||
| CheckAnswer(1, 2, 3)) | ||
| } | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The above test takes advantage of the new |
||
| } | ||
|
|
||
| /** | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
nextBatchTime(nextBatchTime(0)) = 200->nextBatchTime(nextBatchTime(0)) = 100There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextBatchTime(0) = 100, sonextBatchTime(nextBatchTime(0)) = 200?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Sorry for the mistake.