Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetBSD - VM doesn't start after a 120s timeout #62

Closed
kobalicek opened this issue Sep 22, 2023 · 9 comments
Closed

NetBSD - VM doesn't start after a 120s timeout #62

kobalicek opened this issue Sep 22, 2023 · 9 comments

Comments

@kobalicek
Copy link

I'm having the following occasional issue when running NetBSD runner:

  Pseudo-terminal will not be allocated because stdin is not a terminal.
  ssh: connect to host localhost port 2847: Connection refused
  Waiting for VM to be ready...
  Executing command inside VM: true
  /usr/bin/ssh -t runner@localhost
  Pseudo-terminal will not be allocated because stdin is not a terminal.
  ssh: connect to host localhost port 2847: Connection refused
  Waiting for VM to be ready...
  Executing command inside VM: true
  /usr/bin/ssh -t runner@localhost
  Pseudo-terminal will not be allocated because stdin is not a terminal.
  ssh: connect to host localhost port 2847: Connection refused
  Waiting for VM to be ready...
  Executing command inside VM: true
  /usr/bin/ssh -t runner@localhost
  Pseudo-terminal will not be allocated because stdin is not a terminal.
  ssh: connect to host localhost port 2847: Connection refused
  Waiting for VM to be ready...
  Executing command inside VM: true
  /usr/bin/ssh -t runner@localhost
  Pseudo-terminal will not be allocated because stdin is not a terminal.
  ssh: connect to host localhost port 2847: Connection refused
  Terminating VM
  /usr/bin/sudo kill -s TERM 1370
  kill: 1370: No such process
Error: Waiting for VM to become ready timed out after 120 seconds

I'm using QEMU to run it.

Basically the VM is not ready after 120 seconds, which causes the action to be terminated.

I'm not sure what is the problem in this case - if the GHA runner is simply overloaded or whether there is a race or something caused by the action itself, which results in inability to connect to the SSH server inside the VM.

I'm wondering - is this something we have to live with or do you think that this can be fixed somehow? It's very hard to diagnose as it doesn't happen every time, but it happens frequently enough to have my attention.

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Sep 22, 2023

Yeah, it's difficult to say. Could be both something inside the VM and something outside. Perhaps it's possible to run through DTrace to debug it. Not sure if that works on a GHA runner though. Perhaps it's possible to redirect the output of the VM to some file and print that, to see what's going on.

@jacob-carlborg
Copy link
Contributor

Do you have a link to a failing job?

@kobalicek
Copy link
Author

I have - actually two failing jobs within 2 days:

I'm not sure that would help though, as nothing interesting happens in these runs, it just stops at the beginning.

@kobalicek
Copy link
Author

And one more:

I think that this is the most unstable runner at the moment - it fails in like 50% of time like this

@jacob-carlborg
Copy link
Contributor

it fails in like 50% of time like this

Oh, that's pretty bad. I'll see if I can debug the issue.

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Sep 29, 2023

Seems like GitHub made some breaking changes again. This happens when trying to run QEMU:

dyld[1372]: Library not loaded: '/usr/local/opt/capstone/lib/libcapstone.4.dylib'

But it should always fail.

This makes it much easier to fix. I thought all the dependencies were statically linked to avoid this exact problem, but it looks like I missed one.

BTW, this is not specific to NetBSD, it applies for all platforms when QEMU is used as the hypervisor. But since xhyve if the default hypervisor for FreeBSD and OpenBSD on macOS runners it doesn't affect those platforms unless explicitly switching hypervisor to QEMU.

If you're in a hurry you can switch to using Linux runners instead of macOS as a workaround, but macOS has better performance.

@kobalicek
Copy link
Author

Yeah it always fails.

I'm removing netbsd from my CI as this just makes all builds to fail.

I think this is just really unfortunate reality that it's not natively supported by github.

@jacob-carlborg
Copy link
Contributor

Fixed in https://github.com/cross-platform-actions/action/releases/tag/v0.19.1. I've added a test to make sure this doesn't happen again. In doing that I also found another non-system dependency. But that is fixed as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants