Skip to content

Conversation

@squito
Copy link
Contributor

@squito squito commented Jun 1, 2016

What changes were proposed in this pull request?

BlacklistIntegrationSuite (introduced by SPARK-10372) is a bit flaky because of some race conditions:

  1. Failed jobs might have non-empty results, because the resultHandler will be invoked for successful tasks (if there are task successes before failures)
  2. taskScheduler.taskIdToTaskSetManager must be protected by a lock on taskScheduler

(1) has failed a handful of jenkins builds recently. I don't think I've seen (2) in jenkins, but I've run into with some uncommitted tests I'm working on where there are lots more tasks.

While I was in there, I also made an unrelated fix to runningTasksin the test framework -- there was a pointless O(n) operation to remove completed tasks, could be O(1).

How was this patch tested?

I modified the o.a.s.scheduler.BlacklistIntegrationSuite to have it run the tests 1k times on my laptop. It failed 11 times before this change, and none with it. (Pretty sure all the failures were problem (1), though I didn't check all of them).

Also the full suite of tests via jenkins.

@squito squito changed the title [SPARK-15714] Fix flaky o.a.s.scheduler.BlacklistIntegrationSuite [SPARK-15714][CORE] Fix flaky o.a.s.scheduler.BlacklistIntegrationSuite Jun 1, 2016
@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59778 has finished for PR 13454 at commit ccb6c20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val newTasks = taskScheduler.resourceOffers(offers).flatten
val newTaskDescriptions = taskScheduler.resourceOffers(offers).flatten
// get the task now, since that requires a lock on TaskSchedulerImpl, to prevent individual
// tests for introducing a race if they need it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/for/from

@squito
Copy link
Contributor Author

squito commented Jun 3, 2016

thanks for the feedback @vanzin . I updated the comments slightly, which I hope addresses your concerns.

@SparkQA
Copy link

SparkQA commented Jun 3, 2016

Test build #59933 has finished for PR 13454 at commit 6ef06c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 3, 2016

Test build #59935 has finished for PR 13454 at commit 08ab5d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Jun 3, 2016

ok as this seems to be failing quite commonly in some envs I'm going to merge this since it fixes the issue, but open to more feedback.

@squito
Copy link
Contributor Author

squito commented Jun 3, 2016

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants