Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-62014] Refiling PR #137 to investigate test failures #139

Closed
wants to merge 24 commits into from

Conversation

dwnusbaum
Copy link
Member

See #137. This PR updates many dependencies using the plugin BOM as a speculative fix for the test failures seen in the CI build of that PR. This PR also stops building against Windows on Java 11 since it is largely redundant with the other branches.

@dwnusbaum
Copy link
Member Author

dwnusbaum commented Aug 13, 2020

Well, the previous test failures went away, but new ones took their place:

  • ShellStepTest.envVarFilters needs to be changed to either be skipped on Windows or use bat on Windows
  • ShellStepTest.abort failed while cleaning up after the test because something still has the build's log file open, not sure if this is a timing issue or something else
  • ExecutorStepTest.buildShellScriptAcrossDisconnect seems flaky, I saw this fail locally once this morning

@dwnusbaum
Copy link
Member Author

The ExecutorStepTest.buildShellScriptAcrossDisconnect and ExecutorStepTest.contextualizeFreshFilePathAfterAgentReconnection failures seem to be related to launching the process for the sh step. The process itself actually starts executing on the agents, and since the tests watch for file system activity they continue assuming the step is running and do things like shut down the agent, but on the master something is wrong and the call to RemoteLauncher.launch hasn't completed yet when the tests start disconnecting the agents and things break.

I saw one of these fail locally once, but have not seen either of them fail since then.

Maybe it's just a timing issue? I will try adding some sleep calls to check.

@dwnusbaum
Copy link
Member Author

Maybe it's just a timing issue? I will try adding some sleep calls to check.

That didn't help. I guess the next thing would be to check a fresh build of master to make sure that it passes, and if it does, then I perhaps some core change between 2.176.4 and 2.248 has changed the behavior of RemoteLauncher or JNLP agents in a way that affects the tests.

I'm still not sure exactly what is going on with ShellStepTest.abort on Windows either.

…fore agents are disconnected"

This reverts commit 8cbe08d.
@dwnusbaum
Copy link
Member Author

dwnusbaum commented Aug 13, 2020

Interesting, the ExecutorStepTest.buildShellScriptAcrossDisconnect and ExecutorStepTest.contextualizeFreshFilePathAfterAgentReconnection failures are definitely flaky because they just passed on the most recent build. ShellStepTest.abort failing on Windows seems to be consistent though.

@basil
Copy link
Member

basil commented Aug 13, 2020

ShellStepTest.abort failing on Windows seems to be consistent though.

#118 may fix this issue.

@dwnusbaum
Copy link
Member Author

#118 may fix this issue.

I'm not sure, but going by the description of the failures in this comment in JENKINS-59152 I think #118 was attempting to fix different issues with the test that were addressed by jenkinsci/jenkins#4225.

I will try adding a sleep at the end of the test though to see if it's just a timing issue.

@basil
Copy link
Member

basil commented Aug 13, 2020

ShellStepTest.abort failed while cleaning up after the test because something still has the build's log file open

This could also be fallout from jenkinsci/jenkins-test-harness#166.

@dwnusbaum
Copy link
Member Author

This could also be fallout from jenkinsci/jenkins-test-harness#166.

Yeah that is definitely the proximate cause of why the test is failing, but the behavior of the scenario being tested should be the same with or without that PR, so I am trying to figure out if the bat step really is holding log files open on Windows even if the step is interrupted or if there is just something wrong with the test itself.

@dwnusbaum
Copy link
Member Author

Well, there are new flaky tests, but as far as ShellStepTest.abort it looks like the bat step doesn't even get stopped and the build doesn't complete, or at least there is no [Pipeline] End of Pipeline or Finished: ABORTED line in the logs.

I wonder if AssertionErrors thrown by tests are masked by any errors thrown in JenkinsRule.after, and the test is actually failing in the call to ensureForWhile, the reported error is just misleading. I will add some calls to println to check.

@dwnusbaum
Copy link
Member Author

Ok, perfect, that failure tells us that the build never actually completed, and the test actually failed because of an AssertionError, but the way that JenkinsRule works today, the AssertionError is silently dropped because of the error that occurs during cleanup since the build never finished, which is very confusing. I will file a PR to jenkins-test-harness so that exceptions thrown during the test itself take priority over those thrown while cleaning up the JenkinsRule.

@dwnusbaum
Copy link
Member Author

jenkinsci/jenkins-test-harness#236 should help make this kind of failure much easier to diagnose.

@dwnusbaum
Copy link
Member Author

dwnusbaum commented Aug 14, 2020

Perfect, now we see the real problem with ShellStepTest.abort:

java.lang.AssertionError: org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest$$Lambda$236/653172344@6d7a2ac7
	at org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest.ensureForWhile(ShellStepTest.java:771)
	at org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest.abort(ShellStepTest.java:204)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
	at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:599)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: jenkins.util.io.CompositeIOException: Unable to delete 'C:\Users\jenkins\Work\workspace\_durable-task-step-plugin_PR-139\target\tmp\j h508208712461192053'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.

Looks like there were also some new flakes in ExecutorStepTest: reuseNodesWithSameLabelsInDifferentReorderedStages, reuseNodesWithSameLabelsInParallelStages, reuseNodesWithSameLabelsInStagesWrappedInsideParallelStages, reuseNodeInSameRun, and reuseNodeFromPreviousRun.

@dwnusbaum
Copy link
Member Author

Wow, it passed! I think that's all of the consistently failing tests. There were some other random ones I saw, but I'm going to leave them for now. I'm going to create a new PR to update #137 without all of the back-and-forth investigative commits here and then I'll close this PR.

@dwnusbaum
Copy link
Member Author

Fixes have all been pushed into #137.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants