Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: fail on CI if leftover processes #11269

Closed
wants to merge 1 commit into from

Conversation

Trott
Copy link
Member

@Trott Trott commented Feb 9, 2017

If any tests leave processes running after testing results are complete, fail the test run.

Dependent on #11246

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

build test

@Trott Trott added build Issues and PRs related to build files or the CI. test Issues and PRs related to the tests. labels Feb 9, 2017
@nodejs-github-bot nodejs-github-bot added the build Issues and PRs related to build files or the CI. label Feb 9, 2017
Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubber-stamp LGTM... tho someone with better Makefile kung-fu should also review

Makefile Outdated
endif
clear-stalled:
ps awwx | grep Release/node | grep -v grep | cat
ps awwx | grep Release/node | grep -v grep | awk '{print $$1}' | $(XARGS) kill
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no node processes won't this result in calling kill with no argument which will then print out usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson If it did, the CI run for this would have failed hard!

On AIX and OS X (which have BSD-based xargs), empty stdin is a no-op so kill is not called. On all others in CI (which have GNU-based xargs), we add -r (a few lines above) which causes the same behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Whoops, CI hasn't run for this. I was thinking of #11246 which contains these identical lines.)

Copy link
Member

@mhdawson mhdawson Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for the clarification. I think I ran the command on linux but without the -r

Makefile Outdated
$(PYTHON) tools/test.py $(PARALLEL_ARGS) -p tap --logfile test.tap \
--mode=release --flaky-tests=$(FLAKY_TESTS) \
$(TEST_CI_ARGS) $(CI_JS_SUITES)
! ( ps awwx | grep Release/node | grep -v grep )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to print out the info on what's still running here and then cleanup. This will leave them running until the next ci run right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson Yes, that's probably a better approach. I'll try to learn enough about make to figure out how to do that. :-D

@gibfahn
Copy link
Member

gibfahn commented Feb 9, 2017

Took me a while to work out that this "Depends on #11246" because it's a one line change on top of it (of the two commits, the first one is #11246).

Is there a reason to do this as a separate PR?

I still think that if this was done in tools/test.py it'd be more useful, not least for users manually running tests, or to get it included in the tap/junit output.

@Trott
Copy link
Member Author

Trott commented Feb 10, 2017

Is there a reason to do this as a separate PR?

Mostly so the other PR containing minimum required functionality can land as soon as possible and prevent spurious CI failures at the earliest possible time.

Trying to put this post-run stuff into that other PR would mean the stuff in that other PR (that is IMO ready-to-go, at least as a stopgap measure) would have to wait until we resolve whether/how this post-run stuff should terminate processes it find, feasibility of putting it in test.py instead, and whatever else comes up.

@Trott
Copy link
Member Author

Trott commented Feb 10, 2017

OK, the other PR landed, so this one now has a cleaner and shorter diff and I am ready to discuss colors for the bike shed!!!! :-D

@Trott
Copy link
Member Author

Trott commented Feb 10, 2017

On the suggestion that in addition to failing if there are leftover processes, terminate those processes... seems appealing, but as I look at doing it, I'm starting to feel pretty YAGNI about it:

  • As currently set up, I don't think this buys us anything, since we clear stalled jobs at the start of CI runs now.
  • But it has a downside which is that it means the processes can't be inspected after the fact to try to figure out what happened.
  • Every way I'm finding to do it makes the Makefile considerably more complicated or requires an additional external script. Not the end of the world, but just sayin'.

I'm inclined to leave this reporting/failing bit as is and save the terminate-the-processes for a future enhancement if we find we really need it. (Like I said, we lose the ability to inspect the processes if we go that route, so maybe we don't want to do that after all?)

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this change going in and termination being handled in a follow on if we agree we want to do it.

@gibfahn
Copy link
Member

gibfahn commented Feb 14, 2017

@Trott

As currently set up, I don't think this buys us anything, since we clear stalled jobs at the start of CI runs now.

We run a bunch of other tests on the machines (e.g. citgm tests, node-report tests, manually running individual tests). Clearing up at the start is a good way to make sure our test runs pass properly, but clearing up at the end is IMHO the right thing to do as users of a shared CI.

Every way I'm finding to do it makes the Makefile considerably more complicated or requires an additional external script. Not the end of the world, but just sayin'.

Did you try doing it in tools/test.py? That way we'd be able to be more specific about it. You could also then error out when users were running make test, as I assume most users would rather an error than something that passed (but left processes lying around) and then failed on CI.

But it has a downside which is that it means the processes can't be inspected after the fact to try to figure out what happened.

I've never done anything with a process other than look at it with ps -ef, so I'm not clear about what else people actually do.

However, if you implemented this in tools/test.py, I assume you'd have it on by default, and provide a --ps-check=ignore|failkill (or equivalent) to turn off the process killing. Then you could run with it turned off to get processes.

Makefile Outdated
@@ -223,13 +223,15 @@ test-ci-js: | clear-stalled
$(PYTHON) tools/test.py $(PARALLEL_ARGS) -p tap --logfile test.tap \
--mode=release --flaky-tests=$(FLAKY_TESTS) \
$(TEST_CI_ARGS) $(CI_JS_SUITES)
! ( ps awwx | grep Release/node | grep -v grep )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. Are all these flags portable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure they're all used in #11246, so if they aren't we should find out pretty quickly 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're all portable, or at least sufficiently portable that they work on all the non-Windows CI machines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool then.

If any tests leave processes running after testing results are complete,
fail the test run.
@Trott
Copy link
Member Author

Trott commented Feb 21, 2017

This will now both fail if there are leftover processes and try to clean them up. PTAL

(If anyone wants to move it into test.py, be my guest! I'd be happy with that as a subsequent PR, though.)

@Trott
Copy link
Member Author

Trott commented Feb 22, 2017

@Trott
Copy link
Member Author

Trott commented Feb 22, 2017

@nodejs/build

@Trott
Copy link
Member Author

Trott commented Feb 23, 2017

Landed in 189b49a

@Trott Trott closed this Feb 23, 2017
Trott added a commit to Trott/io.js that referenced this pull request Feb 23, 2017
If any tests leave processes running after testing results are complete,
fail the test run.

PR-URL: nodejs#11269
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
addaleax pushed a commit that referenced this pull request Feb 24, 2017
If any tests leave processes running after testing results are complete,
fail the test run.

PR-URL: #11269
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
@italoacasas italoacasas mentioned this pull request Feb 25, 2017
jasnell pushed a commit that referenced this pull request Mar 7, 2017
If any tests leave processes running after testing results are complete,
fail the test run.

PR-URL: #11269
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
jasnell pushed a commit that referenced this pull request Mar 7, 2017
If any tests leave processes running after testing results are complete,
fail the test run.

PR-URL: #11269
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
MylesBorins pushed a commit that referenced this pull request Mar 9, 2017
If any tests leave processes running after testing results are complete,
fail the test run.

PR-URL: #11269
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Mar 9, 2017
MylesBorins pushed a commit that referenced this pull request Mar 9, 2017
If any tests leave processes running after testing results are complete,
fail the test run.

PR-URL: #11269
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Johan Bergström <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Mar 9, 2017
@Trott Trott deleted the fail-if-stalled branch January 13, 2022 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants