-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15714][CORE] Fix flaky o.a.s.scheduler.BlacklistIntegrationSuite #13454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…re we got a failure exception
…er; cleanup runningTasks
|
Test build #59778 has finished for PR 13454 at commit
|
| val newTasks = taskScheduler.resourceOffers(offers).flatten | ||
| val newTaskDescriptions = taskScheduler.resourceOffers(offers).flatten | ||
| // get the task now, since that requires a lock on TaskSchedulerImpl, to prevent individual | ||
| // tests for introducing a race if they need it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/for/from
|
thanks for the feedback @vanzin . I updated the comments slightly, which I hope addresses your concerns. |
|
Test build #59933 has finished for PR 13454 at commit
|
|
Test build #59935 has finished for PR 13454 at commit
|
|
ok as this seems to be failing quite commonly in some envs I'm going to merge this since it fixes the issue, but open to more feedback. |
|
merged to master |
What changes were proposed in this pull request?
BlacklistIntegrationSuite (introduced by SPARK-10372) is a bit flaky because of some race conditions:
(1) has failed a handful of jenkins builds recently. I don't think I've seen (2) in jenkins, but I've run into with some uncommitted tests I'm working on where there are lots more tasks.
While I was in there, I also made an unrelated fix to
runningTasksin the test framework -- there was a pointlessO(n)operation to remove completed tasks, could beO(1).How was this patch tested?
I modified the o.a.s.scheduler.BlacklistIntegrationSuite to have it run the tests 1k times on my laptop. It failed 11 times before this change, and none with it. (Pretty sure all the failures were problem (1), though I didn't check all of them).
Also the full suite of tests via jenkins.