[JENKINS-62014] Refiling PR #137 to investigate test failures #139

dwnusbaum · 2020-08-13T14:42:29Z

See #137. This PR updates many dependencies using the plugin BOM as a speculative fix for the test failures seen in the CI build of that PR. This PR also stops building against Windows on Java 11 since it is largely redundant with the other branches.

dwnusbaum · 2020-08-13T15:03:19Z

Well, the previous test failures went away, but new ones took their place:

ShellStepTest.envVarFilters needs to be changed to either be skipped on Windows or use bat on Windows
ShellStepTest.abort failed while cleaning up after the test because something still has the build's log file open, not sure if this is a timing issue or something else
ExecutorStepTest.buildShellScriptAcrossDisconnect seems flaky, I saw this fail locally once this morning

dwnusbaum · 2020-08-13T17:00:28Z

The ExecutorStepTest.buildShellScriptAcrossDisconnect and ExecutorStepTest.contextualizeFreshFilePathAfterAgentReconnection failures seem to be related to launching the process for the sh step. The process itself actually starts executing on the agents, and since the tests watch for file system activity they continue assuming the step is running and do things like shut down the agent, but on the master something is wrong and the call to RemoteLauncher.launch hasn't completed yet when the tests start disconnecting the agents and things break.

I saw one of these fail locally once, but have not seen either of them fail since then.

Maybe it's just a timing issue? I will try adding some sleep calls to check.

…nts are disconnected

dwnusbaum · 2020-08-13T17:55:43Z

Maybe it's just a timing issue? I will try adding some sleep calls to check.

That didn't help. I guess the next thing would be to check a fresh build of master to make sure that it passes, and if it does, then I perhaps some core change between 2.176.4 and 2.248 has changed the behavior of RemoteLauncher or JNLP agents in a way that affects the tests.

I'm still not sure exactly what is going on with ShellStepTest.abort on Windows either.

…fore agents are disconnected" This reverts commit 8cbe08d.

dwnusbaum · 2020-08-13T18:14:51Z

Interesting, the ExecutorStepTest.buildShellScriptAcrossDisconnect and ExecutorStepTest.contextualizeFreshFilePathAfterAgentReconnection failures are definitely flaky because they just passed on the most recent build. ShellStepTest.abort failing on Windows seems to be consistent though.

basil · 2020-08-13T18:31:09Z

ShellStepTest.abort failing on Windows seems to be consistent though.

#118 may fix this issue.

dwnusbaum · 2020-08-13T18:45:39Z

#118 may fix this issue.

I'm not sure, but going by the description of the failures in this comment in JENKINS-59152 I think #118 was attempting to fix different issues with the test that were addressed by jenkinsci/jenkins#4225.

I will try adding a sleep at the end of the test though to see if it's just a timing issue.

basil · 2020-08-13T19:56:03Z

ShellStepTest.abort failed while cleaning up after the test because something still has the build's log file open

This could also be fallout from jenkinsci/jenkins-test-harness#166.

dwnusbaum · 2020-08-13T20:00:09Z

This could also be fallout from jenkinsci/jenkins-test-harness#166.

Yeah that is definitely the proximate cause of why the test is failing, but the behavior of the scenario being tested should be the same with or without that PR, so I am trying to figure out if the bat step really is holding log files open on Windows even if the step is interrupted or if there is just something wrong with the test itself.

dwnusbaum · 2020-08-13T21:17:08Z

Well, there are new flaky tests, but as far as ShellStepTest.abort it looks like the bat step doesn't even get stopped and the build doesn't complete, or at least there is no [Pipeline] End of Pipeline or Finished: ABORTED line in the logs.

I wonder if AssertionErrors thrown by tests are masked by any errors thrown in JenkinsRule.after, and the test is actually failing in the call to ensureForWhile, the reported error is just misleading. I will add some calls to println to check.

dwnusbaum · 2020-08-13T21:41:40Z

Ok, perfect, that failure tells us that the build never actually completed, and the test actually failed because of an AssertionError, but the way that JenkinsRule works today, the AssertionError is silently dropped because of the error that occurs during cleanup since the build never finished, which is very confusing. I will file a PR to jenkins-test-harness so that exceptions thrown during the test itself take priority over those thrown while cleaning up the JenkinsRule.

dwnusbaum · 2020-08-13T21:53:38Z

jenkinsci/jenkins-test-harness#236 should help make this kind of failure much easier to diagnose.

…see real failure

dwnusbaum · 2020-08-14T15:07:08Z

Perfect, now we see the real problem with ShellStepTest.abort:

java.lang.AssertionError: org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest$$Lambda$236/653172344@6d7a2ac7
	at org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest.ensureForWhile(ShellStepTest.java:771)
	at org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest.abort(ShellStepTest.java:204)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
	at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:599)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: jenkins.util.io.CompositeIOException: Unable to delete 'C:\Users\jenkins\Work\workspace\_durable-task-step-plugin_PR-139\target\tmp\j h508208712461192053'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.

Looks like there were also some new flakes in ExecutorStepTest: reuseNodesWithSameLabelsInDifferentReorderedStages, reuseNodesWithSameLabelsInParallelStages, reuseNodesWithSameLabelsInStagesWrappedInsideParallelStages, reuseNodeInSameRun, and reuseNodeFromPreviousRun.

…in ShellStepTest.abort

…tput is received from the sh step

dwnusbaum · 2020-08-14T19:17:12Z

Wow, it passed! I think that's all of the consistently failing tests. There were some other random ones I saw, but I'm going to leave them for now. I'm going to create a new PR to update #137 without all of the back-and-forth investigative commits here and then I'll close this PR.

dwnusbaum · 2020-08-14T19:29:12Z

Fixes have all been pushed into #137.

daniel-beck and others added 10 commits July 27, 2020 10:45

[JENKINS-62014] Add support for global build step env var filters

a44d227

Fix SpotBugs issues

3337702

Fix dependencies

9731ae7

[JENKINS-62014] Add smoke test for env var filters

4877c2c

Update to generic-environment-filters 1.2 for the test

bea22a5

Ensure the correct types are being listed

8549da1

Update dependency versions to be compatible

a557f2a

Add JUnit to test dependencies, required by matrix-project

cc10978

Use plugin BOM to update dependencies

cc3d063

Don't run a CI build on Windows on Java 11

8bca029

dwnusbaum mentioned this pull request Aug 13, 2020

[JENKINS-62014] Add support for global build step env var filters #137

Merged

dwnusbaum added 2 commits August 13, 2020 11:05

Fix ShellStepTest.envVarFilters on Windows

90a4290

Fix syntax error in test Pipeline in ShellStepTest.envVarFilters

d0ae1e8

Add sleeps to give sh steps in flaky tests time to suspend before age…

8cbe08d

…nts are disconnected

Revert "Add sleeps to give sh steps in flaky tests time to suspend be…

d8829d5

…fore agents are disconnected" This reverts commit 8cbe08d.

dwnusbaum added 3 commits August 13, 2020 14:47

Add sleep to the end of ShellStepTest.abort on Windows

f24fffa

Try to get better diagnostic info from ShellStepTest.abort

1870855

ci.jenkins.io Windows agents are too old for timeout?

19514f5

Ignore flaky tests while trying to diagnose ShellStepTest.abort

e1e3c60

Add some calls to println to see where test is really failing

0ccf328

dwnusbaum mentioned this pull request Aug 13, 2020

Always report exceptions thrown by tests even if JenkinsRule cleanup fails as well jenkinsci/jenkins-test-harness#236

Merged

dwnusbaum added 2 commits August 14, 2020 10:25

Revert diagnostic changes and pull in jenkins-test-harness update to …

e93416f

…see real failure

Revert additional changes

a18e18b

dwnusbaum added 3 commits August 14, 2020 11:21

Wait for build to complete before checking that process is completed …

183d623

…in ShellStepTest.abort

Unignore flaky tests to try to diagnose the problem

8e76fc8

Wait to kill agents in disconnection-related tests until after log ou…

823399a

…tput is received from the sh step

dwnusbaum closed this Aug 14, 2020

dwnusbaum deleted the JENKINS-62014 branch August 14, 2020 19:29

This was referenced Aug 14, 2020

Use non-deprecated lazily evaluated iterateEnclosingBlocks #133

Merged

[JENKINS-26097] Adjust label validation and auto-completion #136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-62014] Refiling PR #137 to investigate test failures #139

[JENKINS-62014] Refiling PR #137 to investigate test failures #139

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020 •

edited

Loading

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020 •

edited

Loading

basil commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

basil commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 14, 2020 •

edited

Loading

dwnusbaum commented Aug 14, 2020

dwnusbaum commented Aug 14, 2020

[JENKINS-62014] Refiling PR #137 to investigate test failures #139

[JENKINS-62014] Refiling PR #137 to investigate test failures #139

Conversation

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020 • edited Loading

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020 • edited Loading

basil commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

basil commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 13, 2020

dwnusbaum commented Aug 14, 2020 • edited Loading

dwnusbaum commented Aug 14, 2020

dwnusbaum commented Aug 14, 2020

dwnusbaum commented Aug 13, 2020 •

edited

Loading

dwnusbaum commented Aug 13, 2020 •

edited

Loading

dwnusbaum commented Aug 14, 2020 •

edited

Loading