-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22956][SS] Bug fix for 2 streams union failover scenario #20150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #85675 has finished for PR 20150 at commit
|
|
Test build #85678 has finished for PR 20150 at commit
|
|
cc @zsxwing |
|
@xuanyuanking could you post the full stack trace about this issue? |
|
Hi Shixiong, thanks a lot for your reply. |
zsxwing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this. Looks good to me. Just some nits.
| batches.slice(sliceStart, sliceEnd) | ||
| } | ||
|
|
||
| if (newBlocks.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add an assert(sliceStart <= sliceEnd, s"sliceStart: $sliceStart sliceEnd: $sliceEnd") above batches.slice(sliceStart, sliceEnd) to make sure getBatch will not be called with wrong offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
| ) | ||
| } | ||
|
|
||
| test("union bug in failover") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: test("SPARK-22956: currentPartitionOffsets should be set when no new data comes in")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
| StartStream(ProcessingTime(100), clock), | ||
| waitUntilBatchProcessed, | ||
| // 5 from smaller topic, 5 from bigger one | ||
| CheckAnswer(0, 1, 2, 3, 4, 100, 101, 102, 103, 104), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can clean these codes a bit using the following snippet:
testStream(kafka)(
StartStream(ProcessingTime(100), clock),
waitUntilBatchProcessed,
// 5 from smaller topic, 5 from bigger one
CheckLastBatch((0 to 4) ++ (100 to 104): _*),
AdvanceManualClock(100),
waitUntilBatchProcessed,
// 5 from smaller topic, 5 from bigger one
CheckLastBatch((5 to 9) ++ (105 to 109): _*),
AdvanceManualClock(100),
waitUntilBatchProcessed,
// smaller topic empty, 5 from bigger one
CheckLastBatch(110 to 114: _*),
StopStream,
StartStream(ProcessingTime(100), clock),
waitUntilBatchProcessed,
// smallest now empty, 5 from bigger one
CheckLastBatch(115 to 119: _*),
AdvanceManualClock(100),
waitUntilBatchProcessed,
// smallest now empty, 5 from bigger one
CheckLastBatch(120 to 124: _*)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this made the code more cleaner.
|
Test build #86128 has finished for PR 20150 at commit
|
|
Thanks! Merging to master and 2.3. |
|
Thanks for your review! Shixiong |
## What changes were proposed in this pull request? This problem reported by yanlin-Lynn ivoson and LiangchangZ. Thanks! When we union 2 streams from kafka or other sources, while one of them have no continues data coming and in the same time task restart, this will cause an `IllegalStateException`. This mainly cause because the code in [MicroBatchExecution](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L190) , while one stream has no continues data, its comittedOffset same with availableOffset during `populateStartOffsets`, and `currentPartitionOffsets` not properly handled in KafkaSource. Also, maybe we should also consider this scenario in other Source. ## How was this patch tested? Add a UT in KafkaSourceSuite.scala Author: Yuanjian Li <[email protected]> Closes #20150 from xuanyuanking/SPARK-22956. (cherry picked from commit 07ae39d) Signed-off-by: Shixiong Zhu <[email protected]>
What changes were proposed in this pull request?
This problem reported by @yanlin-Lynn @ivoson and @LiangchangZ. Thanks!
When we union 2 streams from kafka or other sources, while one of them have no continues data coming and in the same time task restart, this will cause an
IllegalStateException. This mainly cause because the code in MicroBatchExecution , while one stream has no continues data, its comittedOffset same with availableOffset duringpopulateStartOffsets, andcurrentPartitionOffsetsnot properly handled in KafkaSource. Also, maybe we should also consider this scenario in other Source.How was this patch tested?
Add a UT in KafkaSourceSuite.scala