Skip to content

Deflake and speed up parts of ExecutorStepTest #224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 11, 2022

Conversation

jtnord
Copy link
Member

@jtnord jtnord commented May 11, 2022

calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
trimLabels was called (or at least on other timing related things)

THis was discovered by enabling a loggerrule for hudson.model.queue and
observing that the builds would timeout as not all the agents would have
the expected nodes.

e.g.

  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #1] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo?
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo #1
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo #1: ?slave2? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo #1: ?slave1? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo #1: ?slave3? doesn?t have label ?foo?

from reuseNodesWithSameLabelsInParallelStages

Additionally creating agents and waiting for them to come oneline is
slow. A pipeline will start and then wait for the node to be available,
so we can do other things whilst the agent is connecting.

For the case where we need a number of agents connected before we start
to run the pipeline, we now create iall the agents before waiting for them to connect.

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
`trimLabels` was called (or at least on other timing related things)

THis was discovered by enabling a loggerrule for hudson.model.queue and
observing that the builds would timeout as not all the agents would have
the expected nodes.

e.g.

```
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ jenkinsci#1] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo?
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo jenkinsci#1
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo jenkinsci#1: ?slave2? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo jenkinsci#1: ?slave1? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo jenkinsci#1: ?slave3? doesn?t have label ?foo?
```

from `reuseNodesWithSameLabelsInParallelStages`

Additionally creating agents and waiting for them to come oneline is
slow.  A pipeline will start and then wait for the node to be available,
so we can do other things whilst the agent is connecting.

For the case where we need a number of agents connected before we start
to run the pipeline, we now create iall the agents before waiting for them to connect.
@jtnord jtnord force-pushed the deflake-ExecutorStepTest branch from 5f726ea to 47e6c43 Compare May 11, 2022 13:09
@jglick jglick added the tests label May 11, 2022
@jglick
Copy link
Member

jglick commented May 11, 2022

(could switch label to developer if these flakes are holding you up somewhere and you would like an immediate release)

@jtnord
Copy link
Member Author

jtnord commented May 11, 2022

(could switch label to developer if these flakes are holding you up somewhere and you would like an immediate release)

These are currently the most flaky of all, so developer would be great (I currently am not a maintainer so can change the label)

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One apparent mistake, otherwise looks good.

@@ -139,10 +140,8 @@ public class ExecutorStepTest {
*/
@Test public void buildShellScriptOnSlave() throws Throwable {
sessions.then(r -> {
DumbSlave s = r.createOnlineSlave();
s.setLabelString("remote quick");
DumbSlave s = r.createSlave("remote quick", null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 771 to 773
// Note: for Jenkins versions > 2.65, the number of agents must be increased to 5.
// This is due to changes in the Load Balancer (See JENKINS-60563).
int totalAgents = Jenkins.getVersion().isNewerThan(new VersionNumber("2.265")) ? 5 : 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the comment does not match the version number string, and both are older than the baseline…could be cleaned up (but best left to another PR I suppose).

@jglick jglick added developer and removed tests labels May 11, 2022
@jglick jglick changed the title deflake and speedup ExecutorStepTest tests Deflake and speed up parts of ExecutorStepTest May 11, 2022
@jglick jglick enabled auto-merge (squash) May 11, 2022 13:36
@jglick jglick merged commit 252ae12 into jenkinsci:master May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants