Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set -j 1 in macOS nix unit tests #2727

Merged
merged 1 commit into from
Jun 23, 2021
Merged

Conversation

Anviking
Copy link
Member

@Anviking Anviking commented Jun 18, 2021

Issue Number

#2472 / ADP-970

Overview

  • Set -j 1 for hydra Mac builds, in hope this alleviates unit timeouts

Comments

A hypothesis is that #2472 is caused by heavy load and unfocused
resources from running the tests concurrently, risking that the slowest
hspec runner - and thererefore the stdout - being silent for 900s causing
hydra to timeout.

Setting -j 1 should hopefully focus the resource we have in one place. It
should go silent less often, at the expense of the full run getting slower.

Testing locally

Tested locally with nix-build -A checks.cardano-wallet-core. Seems to work.

It now takes 399s instead of 299.7877 on my machine (Not that dramatic slowdown).

We have a -with-rtsopts=-N4 in the .cabal file. I'm not sure if that's good or not, but I suspect it doesn't matter as much as the -j option.

@Anviking Anviking self-assigned this Jun 18, 2021
@Anviking Anviking marked this pull request as draft June 18, 2021 12:41
@Anviking Anviking force-pushed the anviking/ADP-970/concurrency branch from c5f60ca to 34cf86c Compare June 21, 2021 10:38
nix/haskell.nix Outdated Show resolved Hide resolved
@Anviking Anviking force-pushed the anviking/ADP-970/concurrency branch 2 times, most recently from 872e765 to 735dd30 Compare June 21, 2021 17:13
@Anviking Anviking marked this pull request as ready for review June 21, 2021 17:39
@Anviking Anviking requested review from rvl and jonathanknowles June 21, 2021 17:40
@Anviking
Copy link
Member Author

bors try

iohk-bors bot added a commit that referenced this pull request Jun 21, 2021
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jun 21, 2021

try

Build succeeded:

@rvl
Copy link
Contributor

rvl commented Jun 22, 2021

I believe that the main reason why tests were getting stuck on macos more is the problem fixed by #2734.
That problem is a simple case of readMVar waiting forever for a putMVar that never happens.
The exception condition (port in use, address unavailable, etc) happened more with macos.

@Anviking
Copy link
Member Author

Anviking commented Jun 22, 2021

I believe that the main reason why tests were getting stuck on macos more is the problem fixed by #2734.
That problem is a simple case of readMVar waiting forever for a putMVar that never happens.

The entire LoggingSpec was marked pending on macOS in #2709, after which we have observed one proper bors timeout in #2724 (comment), and IIRC a couple more on normal PR non-bors builds.

To me it seems like marking it pending had little to no effect.

That problem is a simple case of readMVar waiting forever for a putMVar that never happens.

From ./scripts/bors-stats.rb list --tag "#2472" --details true there are no failures looking like direct LoggingSpec timeouts. If the LoggingSpecs spawned a lot of threads, and lots of work, perhaps this could contribute to the first hspec thread being slow. "readMVar waiting forever" doesn't sound like a lot of work though.

I think-j 1 is very much worth a shot.

Copy link
Contributor

@rvl rvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Thanks @Anviking

nix/haskell.nix Outdated
#
# Setting -j 1 should hopefully focus the resource we have in one place. It
# should go silent less often, at the expense of the full run getting slower.
unit.testFlags = if pkgs.stdenv.hostPlatform.isDarwin then ["-j" "1"] else [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very very minor, but just so you know, there's a handy lib.optionals function which does the same thing.

Suggested change
unit.testFlags = if pkgs.stdenv.hostPlatform.isDarwin then ["-j" "1"] else [];
unit.testFlags = lib.optionals pkgs.stdenv.hostPlatform.isDarwin ["-j" "1"];

Either way is perfectly fine though.

Attempt to ensure visible progress in the macOS hydra job.

An hypothesis is that #2472 is caused by heavy load and unfocused
resources from running the tests concurrently, risking that the slowest
hspec runner - and thererefore the stdout - being silent for 900s causing
hydra to timeout.

Setting -j 1 should hopefully focus the resource we have in one place. It
should go silent less often, at the expense of the full run getting slower.
@rvl rvl force-pushed the anviking/ADP-970/concurrency branch from 735dd30 to 6c23f5d Compare June 23, 2021 01:54
@rvl
Copy link
Contributor

rvl commented Jun 23, 2021

I think -j 1 is very much worth a shot.

Fair enough.

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jun 23, 2021

Build succeeded:

@iohk-bors iohk-bors bot merged commit 1fb7536 into master Jun 23, 2021
@iohk-bors iohk-bors bot deleted the anviking/ADP-970/concurrency branch June 23, 2021 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants