Skip to content

libsecret: fix non-deterministic fail of test-collection#215584

Merged
7c6f434c merged 1 commit intoNixOS:stagingfrom
superherointj:fix-libsecret-non-deterministic-fail
Feb 10, 2023
Merged

libsecret: fix non-deterministic fail of test-collection#215584
7c6f434c merged 1 commit intoNixOS:stagingfrom
superherointj:fix-libsecret-non-deterministic-fail

Conversation

@superherointj
Copy link
Contributor

@superherointj superherointj commented Feb 9, 2023

libsecret: fix non-deterministic fail of test-collection

  • Problem usually happens when all CPUs are busy.

Error logs: https://gist.github.com/superherointj/08d68a9674f695e73bbabcf8c9a1e535

Upstream issue: https://gitlab.gnome.org/GNOME/libsecret/-/issues/80

Update: It is unclear if this actually solves the problem. Likely doesn't. Because at logs, timeout is 0! (Should list some timeout.)

@superherointj superherointj changed the base branch from master to staging February 9, 2023 22:33
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 501-1000 This PR causes many rebuilds on Linux and should normally target the staging branches. labels Feb 9, 2023
@7c6f434c 7c6f434c merged commit bc1660f into NixOS:staging Feb 10, 2023
@superherointj superherointj deleted the fix-libsecret-non-deterministic-fail branch February 10, 2023 15:16
@SuperSandro2000
Copy link
Member

Thanks! I had this problem for a longer time.

@misuzu
Copy link
Contributor

misuzu commented Mar 18, 2023

Did not fix the issue for me, had to restart libsecret build after a failure (on 518d5f0): https://hydra.armv7l.xyz/build/5230/nixlog/1

@superherointj
Copy link
Contributor Author

superherointj commented Mar 18, 2023

Did not fix the issue for me, had to restart libsecret build after a failure (on 518d5f0): https://hydra.armv7l.xyz/build/5230/nixlog/1

I haven't seen this issue in x86_64-linux since this PR landed and before I was seeing it often in builds.

This error happens when CPU is busy for too long and tests timeout.

The solution here was to increase timeout of tests. The flag is a misnomer because actual value ends up being self.test.timeout, as you can see at:

Using a value avoids the case of when a test binary freezes, the test runner would wait forever.
But for our case, I suppose, Nix would timeout then.

It is still possible to disable timeout in general:

To disable timeout in test cases, add timeout: 0 or a negative value to allow infinite duration for the test case to complete.

So having this other flag would be necessary, and then timeouts would happen at Nix.

As you seem to be building armv7l, which is usually very slow, being an extreme case of unresponsiveness, it could make sense in such case.

As I'm not reproducing such level of slowness here. You would have to test it for yourself. And see if solution proposed works as intended.

Reference: https://mesonbuild.com/Unit-tests.html

~~

@misuzu
Copy link
Contributor

misuzu commented Mar 18, 2023

The flag is a misnomer because actual value ends up being self.test.timeout, as you can see at

Is it though? That branch is only executed if timeout_multiplier is None, but we set it to zero, which should disable timeout entirely.

As you seem to be building armv7l, which is usually very slow, being an extreme case of unresponsiveness, it could make sense in such case.

I'm using Oracle Cloud aarch64 VM to build for armv7l and it's pretty fast.

@superherointj
Copy link
Contributor Author

superherointj commented Mar 18, 2023

Is it though? That branch is only executed if timeout_multiplier is None, but we set it to zero, which should disable timeout entirely.

I don't know. Feel free to check.

@misuzu
Copy link
Contributor

misuzu commented Mar 18, 2023

The default test timeout is 30 seconds, but in my case, the test-collection fails after 12.77 seconds. The comment in the linked issue has an example of it failing after only 2.74 seconds, so the issue is most likely unrelated to timeouts.

@misuzu
Copy link
Contributor

misuzu commented Mar 18, 2023

This is what timeout failures look like: (--timeout-multiplier 0.1):

Summary of Failures:

 9/24 libsecret:libsecret / test-service          TIMEOUT        3.01s   killed by signal 15 SIGTERM
12/24 libsecret:libsecret / test-methods          TIMEOUT        3.01s   killed by signal 15 SIGTERM
24/24 libsecret:libsecret / test-collection       TIMEOUT        3.01s   killed by signal 15 SIGTERM

@superherointj
Copy link
Contributor Author

You have a point.

@superherointj
Copy link
Contributor Author

Timeout in logs was 0. So you are actually right. I don't know problem or solution then.

@Shawn8901
Copy link
Contributor

Shawn8901 commented Mar 22, 2023

Sorry for the kind of unrelated question of the initial issue but I still have to ask.
@misuzu as far as I can see libsecret is building fine for you now, can you by chance share what the issue was? I got basically the same error like you got for my x86-64 build and have no clue what's going wrong. 😞

@misuzu
Copy link
Contributor

misuzu commented Mar 23, 2023

The "solution" is to restart the build until it succeeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 501-1000 This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants