-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15022][SPARK-15023][SQL][Streaming] Add support for testing against the ProcessingTime(intervalMS > 0) trigger and ManualClock
#12797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This paragraph focuses on the
|
| testNextBatchTimeAgainstClock(new SystemClock) | ||
| } | ||
|
|
||
| test("nextBatchTime against ManualClock") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note the ProcessingTimeExecutor issue would fail this test without this patch, but would pass with this patch.
|
Test build #57391 has finished for PR 12797 at commit
|
| } | ||
|
|
||
| /** Return the next multiple of intervalMs */ | ||
| /** Return the next multiple of intervalMs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment style is off, we use javadoc style
|
Some minor comments about code understandability, but overall this looks good. Thanks for working on this! |
|
|
||
| /** Starts the stream, resuming if data has already been processed. It must not be running. */ | ||
| case object StartStream extends StreamAction | ||
| case class StartStream(trigger: Trigger = null, triggerClock: Clock = null) extends StreamAction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the same default values of StreamExecution. Then you don't need to handle the null case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This layer of nulls was intended to delegate the default values of StreamExecution into these tests, so that we don't have to set the same default values in many places and maintain their consistency. But since it seems very unlikely that we would change the default values, so I've removed the nulls layer and followed your comments.
Thanks!
ProcessingTime(intervalMS > 0) trigger and ManualClockProcessingTime(intervalMS > 0) trigger and ManualClock
|
Looks pretty good. @lw-lin could you address the comments and resolve the conflicts? |
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamE xecution.scala
| * Returns the start time in milliseconds for the next batch interval, given the current time. | ||
| * Note that a batch interval is inclusive with respect to its start time, and thus calling | ||
| * `nextBatchTime` with the result of a previous call should return the next interval. (i.e. given | ||
| * an interval of `100 ms`, `nextBatchTime(nextBatchTime(0)) = 200` rather than `0`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nextBatchTime(nextBatchTime(0)) = 200 -> nextBatchTime(nextBatchTime(0)) = 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextBatchTime(0) = 100, so nextBatchTime(nextBatchTime(0)) = 200?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Sorry for the mistake.
|
LGTM pending tests. |
|
Test build #57691 has finished for PR 12797 at commit
|
|
Merging to master / 2.0. Thanks again @lw-lin |
…ainst the `ProcessingTime(intervalMS > 0)` trigger and `ManualClock` ## What changes were proposed in this pull request? Currently in `StreamTest`, we have a `StartStream` which will start a streaming query against trigger `ProcessTime(intervalMS = 0)` and `SystemClock`. We also need to test cases against `ProcessTime(intervalMS > 0)`, which often requires `ManualClock`. This patch: - fixes an issue of `ProcessingTimeExecutor`, where for a batch it should run `batchRunner` only once but might run multiple times under certain conditions; - adds support for testing against the `ProcessingTime(intervalMS > 0)` trigger and `AdvanceManualClock`, by specifying them as fields for `StartStream`, and by adding an `AdvanceClock` action; - adds a test, which takes advantage of the new `StartStream` and `AdvanceManualClock`, to test against [PR#[SPARK-14942] Reduce delay between batch construction and execution ](#12725). ## How was this patch tested? N/A Author: Liwei Lin <[email protected]> Closes #12797 from lw-lin/add-trigger-test-support. (cherry picked from commit e597ec6) Signed-off-by: Shixiong Zhu <[email protected]>
What changes were proposed in this pull request?
Currently in
StreamTest, we have aStartStreamwhich will start a streaming query against triggerProcessTime(intervalMS = 0)andSystemClock.We also need to test cases against
ProcessTime(intervalMS > 0), which often requiresManualClock.This patch:
ProcessingTimeExecutor, where for a batch it should runbatchRunneronly once but might run multiple times under certain conditions;ProcessingTime(intervalMS > 0)trigger andManualClock, by specifying them as fields forStartStream, and by adding anAdvanceManualClockaction;StartStreamandAdvanceManualClock, to test against PR#[SPARK-14942] Reduce delay between batch construction and execution .How was this patch tested?
N/A