-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is a workaround to failures we occasionally see on self-hosted #25613
Conversation
Typo in PR title & the commit message. Note I think I can also observe similar on Linux...... when running HARNESS_JOBS set to For the tests that do use network, I do wonder if it would help to spin up a user-network namespace or a freebsd-jail, as it seemed like multiple tests were talking to the wrong client/server, or getting cleaned up in contention. |
I'd still really like to see the tests in question made idempotent. |
I understand. But the glitch seems to be coming from perl test framework. Output lines seem to be mixed up and TAP parser interprets that as a failure. I still have to find a way how to get better insight of what's happening with test logs for individual tests. The only machine where bug is triggered quite often is E2 instance on google. I was not able to find suitable settings for other virtualization means (proxmox, vmd). |
Is it possible to see why it goes wrong with something like strace? Is there something that runs parallel, and mixes output from different writers? Does it need some kind of lock on stdout? |
that's what I'm trying to figure out. there is a dtrace on FreeBSD, so tools do exist but I need to better understand what's going on in perl first. I would like to see how things look like in log parsed by TAP::Parser first. I don't know how to obtain a copy of log as seen by TAP parser, so that's where I'm at now. |
FYI I'm just testing change which might be considered as better workaround. Diff reads as follows:
Yesterday I've managed a way how to inspect data seen by
If test fails with
See line which begins with The
As you can see The lines which interfere with output expected by
What's worth to note in snippet above is that the output should go to
I suspect that if file descriptor for |
This avoids false psotivie failures on FreeBSD-CI which suffers most from this issue. Fixes openssl#23992
I've just forced pushed a change which reverts changes to github workflows and adds a one-liner tweak to |
Nice work @Sashan! |
Woo hoo! Well done! |
This comment adds some debugging notes worth to remember.
|
All tests are green now. Looks like The easiest option is to let perl on windows to use Also note I'm using
We don't want to mix stderr with stdout not to confuse
I think this is good to go now. |
This pull request is ready to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job - @Sashan
Nice, detailed analysis and the fix.!
This should be merged to all the active branches. |
This avoids false psotivie failures on FreeBSD-CI which suffers most from this issue. Fixes #23992 Reviewed-by: Matt Caswell <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> Reviewed-by: Bernd Edlinger <[email protected]> (Merged from #25613)
This avoids false psotivie failures on FreeBSD-CI which suffers most from this issue. Fixes #23992 Reviewed-by: Matt Caswell <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> Reviewed-by: Bernd Edlinger <[email protected]> (Merged from #25613) (cherry picked from commit 3d3bb26)
This avoids false psotivie failures on FreeBSD-CI which suffers most from this issue. Fixes #23992 Reviewed-by: Matt Caswell <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> Reviewed-by: Bernd Edlinger <[email protected]> (Merged from #25613) (cherry picked from commit 3d3bb26)
This avoids false psotivie failures on FreeBSD-CI which suffers most from this issue. There was conflict when merging openssl#25613 from master to openssl-3.2 and earlier. The openssl-3.2 branch (and earlier branches) are missing 4439ed1 which introduces `open2()` to create s_server process. Fixes openssl#23992
it got merged to master, 3.4 and 3.3. 3.2 and earlier merge ended up with conflict which I've resolved in PR here: |
Closing as this has been merged to |
This avoids false psotivie failures on FreeBSD-CI which suffers most from this issue. Fixes openssl#23992 Reviewed-by: Matt Caswell <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> Reviewed-by: Bernd Edlinger <[email protected]> (Merged from openssl#25613)
runner when running tests on FreeBSD. These failures are caused by some kind of race condition in perl code which controls test suite run. The typical symptom is:
The racecondition is time dependant. The slower the machine is the more likely is we trigger it. Our current self hosted runners run on google cloud as E2 instances. If I run simple shell script here:
The script seems to alwasy stop after 13 - 45 iterations due to 'No plan found in TAP output' error. Another possible error I could see was 'tests run out of sequence'. Both errors are occasionally seen.
If I upgrade google instance from E2 to N2 the script above continues to run overnight until I stop it.
Another option is to set number of harness jobs to 1. This also keeps script above running with no issues.
When using 1 job instead of 4 jobs makes test to run 11 minutes instead of 8. So it runs like 40% slower.