Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: include exit code in test failures #19855

Closed
wants to merge 2 commits into from

Conversation

Trott
Copy link
Member

@Trott Trott commented Apr 6, 2018

Include the exit code in test failures. This will give us more
information during the currently-puzzling failures that provide no
information in CI such as:

03:10:10 not ok 563 parallel/test-fs-truncate
03:10:10   ---
03:10:10   duration_ms: 1.119
03:10:10   severity: fail
03:10:10   stack: |-
03:10:10   ...

@BridgeAR @addaleax @nodejs/build @nodejs/testing

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

Include the exit code in test failures. This will give us more
information during the currently-puzzling failures that provide no
information in CI such as:

```
03:10:10 not ok 563 parallel/test-fs-truncate
03:10:10   ---
03:10:10   duration_ms: 1.119
03:10:10   severity: fail
03:10:10   stack: |-
03:10:10   ...
```
@Trott Trott added test Issues and PRs related to the tests. tools Issues and PRs related to the tools directory. flaky-test Issues and PRs related to the tests with unstable failures on the CI. python PRs and issues that require attention from people who are familiar with Python. fast-track PRs that do not need to wait for 48 hours to land. labels Apr 6, 2018
@nodejs-github-bot nodejs-github-bot added test Issues and PRs related to the tests. tools Issues and PRs related to the tools directory. labels Apr 6, 2018
@refack
Copy link
Contributor

refack commented Apr 6, 2018

Tad "hacky", I'll see if I can make something more TAPish.

@refack
Copy link
Contributor

refack commented Apr 6, 2018

Just scribbling some tweaks

https://ci.nodejs.org/job/node-test-pull-request-lite/443/

tools/test.py Outdated
@@ -325,12 +329,10 @@ def HasRun(self, output):
# duration_ms is measured in seconds and is read as such by TAP parsers.
# It should read as "duration including ms" rather than "duration in ms"
logger.info(' ---')
logger.info(' duration_ms: %d.%d' %
logger.info(' duration (s): %d.%d' %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an annoyance for many folks but I'd advise against changing this and instead learn to read it as "duration including milliseconds". From memory there's some tooling in Jenkins that depends on this. It's become a TAP quirk, not something we've introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I'm just smoke testing the CI TAP tooling

tools/test.py Outdated
@@ -299,6 +302,7 @@ def HasRun(self, output):

if output.HasTimedOut():
self.severity = 'fail'
self.traceback += '\nTimed Out'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but I'd prefer we leave existing strings alone on this output where possible, it's parsable and we don't know what might depend on it--likely nothing but why risk it. I assume that in the majority of cases there's not going to be a traceback to append to so perhaps if you check for it being length>0 and then append \n and then regardless append 'timeout', so in the majority of cases we just get a 'timeout' and nothing's changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation was that the previous code was self.traceback = 'timeout', so in cases where there was output from the test we have been losing it if the test timesout.

As for the parsability of the output, I'm on a deepdive into the CI parsing code, trying to grok what's actually being consumed and what just passes through (I have same concern for the new exitcode field).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.P.S. I'm looking into finishing the work started by @jbergstroem (tap2junit) as the final goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest of the changes look okay, but would it be okay to use self.severity to signal that the test has timed out? I think its better not to alter the traceback. Extreme corner case: if a test prints Timed Out and exits, we cannot be sure if the test exceeded the timelimit or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for context the old code did:

self.traceback = 'timeout'

And even the current code does:

self.traceback = output.output.stdout + output.output.stderr

and
self.traceback = "oh no!\nexit code: " + PrintCrashed(exit_code)

So we are already being naughty with traceback


As for severity, it is the only really semantic field (the TAP plugin uses it to categorize the results), so I would advise against changing it. A different option is to add a message field or even a dedicated timed out field.

@Trott
Copy link
Member Author

Trott commented Apr 6, 2018

Because these changes are getting mildly more involved, I'll remove the fast-track label. Feel free to re-add it. (And thanks, @refack! I appreciate you making my quick hack more thoughtful!)

@Trott Trott removed the fast-track PRs that do not need to wait for 48 hours to land. label Apr 6, 2018
@refack
Copy link
Contributor

refack commented Apr 6, 2018

(And thanks, @refack! I appreciate you making my quick hack more thoughtful!)

Ohh boy I got rusty with the git 👴, wanted to draft something with the GitHub GUI (always a mistake) and ended up pushing to your branch...

Pulled my changes out of your branch, sorry.

@Trott
Copy link
Member Author

Trott commented Apr 6, 2018

Pulled my changes out of your branch, sorry.

It's cool. You can work in my branch if you want. Or work in your own branch. Whatevs.

@refack
Copy link
Contributor

refack commented Apr 6, 2018

It's cool. You can work in my branch if you want. Or work in your own branch. Whatevs.

Always the gracious host 🎩
Please forgive any faux pas I make while I re-get my footing.

@Trott
Copy link
Member Author

Trott commented Apr 6, 2018

Please forgive any faux pas I make while I re-get my footing.

@refack Delighted you're back!

@Trott
Copy link
Member Author

Trott commented Apr 9, 2018

@refack: As we continue to see severity: fail with no stdout or stderr in CI, I'm kind of eager to land even just my initial hacky change just to see what the exit codes are. Are your improvements likely to be ready to go soon? If so, AWESOME! If not, do you mind if I move forward with my one-line change and you can make it less hacky whenever you're done?

@refack refack added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 9, 2018
@refack
Copy link
Contributor

refack commented Apr 9, 2018

I'm kind of eager to land

AFAICT this is ready to land as is. I've tested it in multiple scenarios. We just need a +1 from someone.

@refack
Copy link
Contributor

refack commented Apr 9, 2018

Screenshot from synthetic timeout:
image

@Trott
Copy link
Member Author

Trott commented Apr 9, 2018

@nodejs/python

@Trott
Copy link
Member Author

Trott commented Apr 9, 2018

@Trott
Copy link
Member Author

Trott commented Apr 9, 2018

Linux failure is build/infra-related. Unfortunately, I'm unable to fix it. (Tried wiping the workspace in Jenkins.)

@refack
Copy link
Contributor

refack commented Apr 9, 2018

@BridgeAR
Copy link
Member

BridgeAR commented Apr 9, 2018

@refack please only add the author-ready label in case a PR has at least one LG as described in the collaborator guide.

@BridgeAR BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 9, 2018
@BridgeAR
Copy link
Member

BridgeAR commented Apr 9, 2018

@nodejs/python @nodejs/build PTAL. This needs some LGs.

@Trott
Copy link
Member Author

Trott commented Apr 10, 2018

Re-running Windows (the only CI task that failed): https://ci.nodejs.org/job/node-test-commit-windows-fanned/17087/

@rvagg
Copy link
Member

rvagg commented Apr 10, 2018

I'm still not cool with the change of the string timeout to \nTimed Out. It's a separate issue to there being a stack or not; most likely there will not be a stack so it'll just be a straight timeout -> \nTimed Out. Unnecessarily messing with a parseable format isn't a great idea.

@refack
Copy link
Contributor

refack commented Apr 10, 2018

I'm still not cool with the change of the string timeout to \nTimed Out. It's a separate issue to there being a stack or not; most likely there will not be a stack so it'll just be a straight timeout -> \nTimed Out. Unnecessarily messing with a parseable format isn't a great idea.

I rolled-back that specific change, But I do want to point out:

self.traceback = output.output.stdout + output.output.stderr

so the current code swallows not only stacks, but all output to stderr & stdout.

https://ci.nodejs.org/job/node-test-pull-request/14184/

@Trott
Copy link
Member Author

Trott commented Apr 11, 2018

LGTM but I guess I can't approve my own PR even if I didn't write any of the code. :-P

@Trott
Copy link
Member Author

Trott commented Apr 11, 2018

Would be nice to get at least one more approval. @nodejs/build @nodejs/python @jbergstroem @bnoordhuis @Fishrock123 @mscdex @rvagg @nodejs/testing

@Trott Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 11, 2018
@Trott
Copy link
Member Author

Trott commented Apr 11, 2018

@Trott
Copy link
Member Author

Trott commented Apr 11, 2018

@Trott
Copy link
Member Author

Trott commented Apr 12, 2018

These test-http-readable-data-event failures are getting really old...

Linux yet again: https://ci.nodejs.org/job/node-test-commit-linux/17884/

Trott added a commit to Trott/io.js that referenced this pull request Apr 12, 2018
Include the exit code in test failures. This will give us more
information during the currently-puzzling failures that provide no
information in CI such as:

```
03:10:10 not ok 563 parallel/test-fs-truncate
03:10:10   ---
03:10:10   duration_ms: 1.119
03:10:10   severity: fail
03:10:10   stack: |-
03:10:10   ...
```

PR-URL: nodejs#19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
Trott pushed a commit to Trott/io.js that referenced this pull request Apr 12, 2018
PR-URL: nodejs#19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
@Trott
Copy link
Member Author

Trott commented Apr 12, 2018

Landed in f3f1298...a3db1cc

@Trott Trott closed this Apr 12, 2018
targos pushed a commit that referenced this pull request Apr 12, 2018
Include the exit code in test failures. This will give us more
information during the currently-puzzling failures that provide no
information in CI such as:

```
03:10:10 not ok 563 parallel/test-fs-truncate
03:10:10   ---
03:10:10   duration_ms: 1.119
03:10:10   severity: fail
03:10:10   stack: |-
03:10:10   ...
```

PR-URL: #19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
targos pushed a commit that referenced this pull request Apr 12, 2018
PR-URL: #19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
jasnell pushed a commit that referenced this pull request Apr 16, 2018
Include the exit code in test failures. This will give us more
information during the currently-puzzling failures that provide no
information in CI such as:

```
03:10:10 not ok 563 parallel/test-fs-truncate
03:10:10   ---
03:10:10   duration_ms: 1.119
03:10:10   severity: fail
03:10:10   stack: |-
03:10:10   ...
```

PR-URL: #19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
jasnell pushed a commit that referenced this pull request Apr 16, 2018
PR-URL: #19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
MylesBorins pushed a commit that referenced this pull request Aug 17, 2018
Include the exit code in test failures. This will give us more
information during the currently-puzzling failures that provide no
information in CI such as:

```
03:10:10 not ok 563 parallel/test-fs-truncate
03:10:10   ---
03:10:10   duration_ms: 1.119
03:10:10   severity: fail
03:10:10   stack: |-
03:10:10   ...
```

PR-URL: #19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
MylesBorins pushed a commit that referenced this pull request Aug 17, 2018
PR-URL: #19855
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Gibson Fahnestock <[email protected]>
Reviewed-By: Rod Vagg <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Aug 17, 2018
@Trott Trott deleted the add-error-code branch January 13, 2022 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. flaky-test Issues and PRs related to the tests with unstable failures on the CI. python PRs and issues that require attention from people who are familiar with Python. test Issues and PRs related to the tests. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants