Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in a bootloop #74

Closed
madebr opened this issue Oct 10, 2023 · 24 comments
Closed

Stuck in a bootloop #74

madebr opened this issue Oct 10, 2023 · 24 comments

Comments

@madebr
Copy link

madebr commented Oct 10, 2023

Hello!

We recently noticed FreeBSD GitHub Actions timing out due to being stuck in a boot loop.
ci log: https://github.com/libsdl-org/SDL/actions/runs/6464128982/job/17548308002

The following message is printed every 2s over and over again:

2023-10-10T04:08:13.4019400Z <<BOOT>>—-——
2023-10-10T04:08:13.4029930Z Copyright (c) 1992-2821 The FreeBSD Project.
2023-10-10T04:08:13.4038380Z Copyright (c) 1979, 1988, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
2023-10-10T04:08:13.4060030Z The Regents of the University of California. All rights reserved.
2023-10-10T04:08:13.4066080Z FreeBSD is a registered trademark of The FreeBSD Foundation.
2023-10-10T04:08:13.4079190Z FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
2023-10-10T04:08:13.4080880Z FreeBSD clang version 14.8.5 (https://github.com/Ilvm/1lvm-project .git 1lvmorg-1
2023-10-10T04:08:13.4082420Z 4.8.5-8-gc12386ae247c)
2023-10-10T04:08:13.4084490Z VT (vga): text 88x25
2023-10-10T04:08:13.4086040Z CPU: Intel(R) Core(TM) i7-8708B CPU @ 3.2@GHz (3192.98-MHz KB-class CPU)
2023-10-10T04:08:13.4088090Z Origin="GenuineIntel” Id=@x3@6aa_ Family=@x6 Model=@x3a Stepping=18
2023-10-10T04:08:13.4090070Z Features=@x1783f bff <FPU, VME ,DE,PSE,TSC.MSR,PAE MCE ,CX8,APIC,SEP.MTRR,PGE .MCA.C
2023-10-10T04:08:13.4092310Z HOV. PAT ,PSE36 , MMX,FXSR SSE ,SSE2,HTT>
2023-10-10T04:08:13.4094220Z Features2=@x5eda2283<SSE3 , PCLMULQDO, SSSE3,CX16,PCID,SSE4.1,5SE4.2,MOVBE ,POPCNT
2023-10-10T04:08:13.4096340Z /AESNI, XSAVE , OSXSAVE , AVX, RDRAND>
2023-10-10T04:08:13.4098150Z AND Features=@x20188880<SYSCALL .NX.LM>
2023-10-10T04:08:13.4099950Z AND Features2=8x121<LAHF ,ABM,Prefetch>
2023-10-10T04:08:13.4101860Z Structured Extended Features=@x842421<FSGSBASE ,AVX2, INVPCID.NFPUSG , RDSEED, CLFL
2023-10-10T04:08:13.4103700Z USHOPT>
2023-10-10T04:08:13.4105450Z real memory = 9126885584 (8784 MB)
2023-10-10T04:08:13.4107220Z avail memory = 8291483648 (7987 MB)
2023-10-10T04:08:13.4109100Z Event timer “LAPIC” quality 108
2023-10-10T04:08:13.4110930Z ACPI APIC Table: <VBOX  VBOXAPIC>
madebr referenced this issue in libsdl-org/SDL Oct 10, 2023
This avoids assuming that the pixels are suitably aligned for direct
access, which there's no guarantee that they are; in particular,
3-bytes-per-pixel RGB images are likely to have 3 out of 4 pixels
misaligned. On x86, dereferencing a misaligned pointer does what you
would expect, but on other architectures it's undefined whether it will
work, crash with SIGBUS, or silently give a wrong answer.

Signed-off-by: Simon McVittie <[email protected]>
@timandy
Copy link

timandy commented Oct 13, 2023

@ojwb
Copy link

ojwb commented Oct 13, 2023

I'm hitting this too, in case another data point is useful. It runs for 6 hours then times out:

https://github.com/xapian/xapian/actions/runs/6501132629/job/17658111019

It wasn't failing like this in runs 3 days ago or before.

@madebr
Copy link
Author

madebr commented Oct 13, 2023

I'm hitting this too, in case another data point is useful. It runs for 6 hours then times out:

We added jobs.<job_id>.timeout-minutes: 30 to our workflow to limit it somehow

Dudemanguy added a commit to Dudemanguy/mpv that referenced this issue Oct 19, 2023
It bootloops quite often these days which is annoying and clogs up all
the macos runners. vmactions/freebsd-vm#74
Dudemanguy added a commit to mpv-player/mpv that referenced this issue Oct 19, 2023
It bootloops quite often these days which is annoying and clogs up all
the macos runners. vmactions/freebsd-vm#74
Dudemanguy added a commit to Dudemanguy/mpv that referenced this issue Oct 23, 2023
Since vmactions is basically a bootlooping disaster* with no signs of
life from upstream, let's try a different action instead and hope it
works better. We don't need to force the latest release channel, so
delete that part. Also make the pkg install just one command for
simplicity.

*: vmactions/freebsd-vm#74
Dudemanguy added a commit to Dudemanguy/mpv that referenced this issue Oct 23, 2023
Since vmactions is basically a bootlooping disaster* with no signs of
life from upstream, let's try a different action instead and hope it
works better. We don't need to force the latest release channel, so
delete that part. Also make the pkg install just one command for
simplicity.

*: vmactions/freebsd-vm#74
Dudemanguy added a commit to Dudemanguy/mpv that referenced this issue Oct 23, 2023
Since vmactions is basically a bootlooping disaster* with no signs of
life from upstream, let's try a different action instead and hope it
works better. We don't need to force the latest release channel, so
delete that part. Also make the pkg install just one command for
simplicity.

*: vmactions/freebsd-vm#74
Dudemanguy added a commit to Dudemanguy/mpv that referenced this issue Oct 23, 2023
Since vmactions is basically a bootlooping disaster* with no signs of
life from upstream, let's try a different action instead and hope it
works better. We don't need to force the latest release channel, so
delete that part. Also make the pkg install just one command for
simplicity.

*: vmactions/freebsd-vm#74
Dudemanguy added a commit to mpv-player/mpv that referenced this issue Oct 23, 2023
Since vmactions is basically a bootlooping disaster* with no signs of
life from upstream, let's try a different action instead and hope it
works better. We don't need to force the latest release channel, so
delete that part. Also make the pkg install just one command for
simplicity.

*: vmactions/freebsd-vm#74
eserte added a commit to eserte/Doit that referenced this issue Oct 26, 2023
The freebsd-vm action stopped to work, see
vmactions/freebsd-vm#74
eserte added a commit to eserte/Doit that referenced this issue Oct 26, 2023
The freebsd-vm action stopped to work, see
vmactions/freebsd-vm#74
emarsden added a commit to emarsden/dash-mpd-rs that referenced this issue Oct 27, 2023
The freebsd VM is inflooping on boot, which is a known issue.

  vmactions/freebsd-vm#74

Move instead to cross-platforms-actions/action.
flavorjones added a commit to sparklemotion/nokogiri that referenced this issue Oct 29, 2023
they are hanging/failing too often and are taking 6+ hours to fail

see these upstream issues:
- vmactions/freebsd-vm#68
- vmactions/freebsd-vm#74
r4sas added a commit to PurpleI2P/i2pd that referenced this issue Oct 31, 2023
@liquidaty
Copy link

liquidaty commented Nov 2, 2023

same. What is the workaround?

@ojwb
Copy link

ojwb commented Nov 7, 2023

It's a totally rewritten version, based on qemu, instead of virtualbox. The performance is better than virtualbox.

please give it a try

I've updated to v1 - only one run so far but that succeeded and was faster than a typical run with the old version.

So LGTM. I'll let you know if we hit further problems.

@daniel-mohr
Copy link

I used your new version v1, too. It seems to work. Great! Thanks.

But in an private project it gets stuck. I do not know the reason. It runs very slow and after about 28 minutes it is aborted. The same job in a public project runs in about 4 minutes. Maybe it is the provided github runner, which is slow in private projects and fast in public ones. So, I do not believe it's your code.

Therefore again: Thanks!

@Neilpang
Copy link
Member

Neilpang commented Nov 8, 2023

@daniel-mohr I don't think there is any difference for private repo. please make sure you use ubuntu-22.04 to run.
Don't use macOS for now.

@bobzilladev
Copy link

bobzilladev commented Nov 8, 2023

Also want to thank you for making these updates, it's much faster most of the time, and doesn't have the boot loop issue.

I'm also seeing some slow workflow runs, maybe it's cpu or network throttling? Perhaps adding some caching for the downloading/updating bits would help. Some network logging from fast and slow runs, all in a public project using ubuntu:

Fast Run (10 minutes):

2023-11-07T20:53:25.0934584Z ##[group]Run vmactions/freebsd-vm@v1
2023-11-07T20:53:25.3937772Z (50.4 MB/s) - ‘./vbox.sh’ saved [11169/11169]
2023-11-07T20:53:35.5503518Z Fetched 7892 kB in 5s (1483 kB/s)
2023-11-07T20:53:43.1188940Z Fetched 60.3 MB in 6s (10.1 MB/s)
2023-11-07T20:54:05.3170363Z Fetched 38.6 MB in 1s (53.5 MB/s)
2023-11-07T20:56:00.8294626Z exec shell: bash run.sh rsyncToVM
2023-11-07T20:56:31.7692363Z Downloading aiohttp-3.8.4.tar.gz 7.3/7.3 MB (79.3 MB/s)
2023-11-07T20:56:49.2705349Z Downloading Babel-2.13.1-py3-none-any.whl 10.1/10.1 MB (126.9 MB/s)
2023-11-07T20:56:50.5512562Z Building wheel for aiohttp (pyproject.toml): started
2023-11-07T20:57:00.3510405Z Building wheel for aiohttp (pyproject.toml): finished with status 'done' (10s)

Slow Run (30 minute timeout):

2023-11-07T21:13:57.8769710Z ##[group]Run vmactions/freebsd-vm@v1
2023-11-07T21:13:58.4322269Z (23.2 MB/s) - ‘./vbox.sh’ saved [11169/11169]
2023-11-07T21:14:06.3561042Z Fetched 7892 kB in 2s (4505 kB/s)
2023-11-07T21:14:19.4740832Z Fetched 60.3 MB in 11s (5383 kB/s)
2023-11-07T21:14:45.1219093Z Fetched 38.6 MB in 1s (26.1 MB/s)
2023-11-07T21:18:32.7602294Z exec shell: bash run.sh rsyncToVM
2023-11-07T21:22:00.7560925Z Downloading aiohttp-3.8.4.tar.gz (7.3 MB)
2023-11-07T21:27:20.9163592Z Downloading Babel-2.13.1-py3-none-any.whl (10.1 MB)
2023-11-07T21:27:42.7799758Z Building wheel for aiohttp (pyproject.toml): started
2023-11-07T21:30:05.5075054Z Building wheel for aiohttp (pyproject.toml): finished with status 'done' (2m 23s)

Other Runs

other fast runs:
(51.4 MB/s) - ‘./vbox.sh’ saved [11169/11169]
(63.4 MB/s) - ‘./vbox.sh’ saved [11169/11169]
other slow runs:
(3.79 MB/s) - ‘./vbox.sh’ saved [11169/11169]
(5.60 MB/s) - ‘./vbox.sh’ saved [11169/11169]

@Neilpang
Copy link
Member

Neilpang commented Nov 8, 2023

please show all the timestamps in your workflow.

image

@mjp41
Copy link

mjp41 commented Nov 8, 2023

I've had a similar thing. Here is a link to a timed out build (25 minutes):

https://github.com/microsoft/snmalloc/actions/runs/6788099275/job/18452344518

and the same thing running a bit later and completing in 4 minutes:

https://github.com/microsoft/snmalloc/actions/runs/6796774037/job/18477552286

It looks like it was either memory or CPU constrained as it was during compilation it took a long time.

@daniel-mohr
Copy link

I've had a similar thing. Here is a link to a timed out build (25 minutes):

Looks very similar to my case in the private project!

@bobzilladev
Copy link

Updated with timestamps and some more logs. Seeing that building aiohttp takes 10s vs 2m23s, seems like at least cpu is constrained.

@Neilpang
Copy link
Member

Neilpang commented Nov 8, 2023

@mjp41 The NetBSD vm v1.0.0 is released. please have a try.

Thanks.

@mjp41
Copy link

mjp41 commented Nov 8, 2023

@Neilpang thank you so much. It works, I have a complete green CI again. Thank you.

@mjp41
Copy link

mjp41 commented Nov 9, 2023

@Neilpang I have had two more timeouts where it seems to be CPU/memory constrained. Since switching over it is ~20% of builds experiencing this (very low sample: 2 in the last 10). Please let me know if there is any more information I can gather.

@bobzilladev
Copy link

bobzilladev commented Nov 9, 2023

Have found a correlation between processor type and running time, after reading why-github-actions-is-so-slow. Small sample size after adding lscpu, but holding true so far.

  1. Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz timed out
  2. AMD EPYC 7763 64-Core Processor fast
  3. AMD EPYC 7763 64-Core Processor fast
  4. AMD EPYC 7763 64-Core Processor fast
  5. Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz timed out
  6. Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz timed out
  7. AMD EPYC 7763 64-Core Processor fast
  8. AMD EPYC 7763 64-Core Processor fast
  9. Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz timed out
  10. AMD EPYC 7763 64-Core Processor fast

@Neilpang
Copy link
Member

Neilpang commented Nov 9, 2023

The ubuntu runner has only 2c 7GB memory.
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners

We will support running on Mac later, which has 4c and 14GB memory, that should be better.

@Consolatis
Copy link

Consolatis commented Nov 12, 2023

I've had a similar thing. Here is a link to a timed out build (25 minutes):

https://github.com/microsoft/snmalloc/actions/runs/6788099275/job/18452344518

and the same thing running a bit later and completing in 4 minutes:

https://github.com/microsoft/snmalloc/actions/runs/6796774037/job/18477552286

It looks like it was either memory or CPU constrained as it was during compilation it took a long time.

The slow one has this in its logs: WARNING KVM acceleration not available, using 'qemu'
The other one does not.

For us its a difference with about 4 minutes when being lucky with the chosen runner supporting (nested?) virtualization and about 16 minutes if not.

flavorjones added a commit to sparklemotion/nokogiri that referenced this issue Nov 17, 2023
they are hanging/failing too often and are taking 6+ hours to fail

see these upstream issues:
- vmactions/freebsd-vm#68
- vmactions/freebsd-vm#74

[skip ci]

(cherry picked from commit 7a8ca87)
@Neustradamus
Copy link

Have you progressed on this issue?

@Neilpang
Copy link
Member

@Neustradamus Yes, it already fixed.

flavorjones added a commit to sparklemotion/nokogiri that referenced this issue Dec 13, 2023
flavorjones added a commit to sparklemotion/nokogiri that referenced this issue Dec 13, 2023
flavorjones added a commit to sparklemotion/nokogiri that referenced this issue Dec 13, 2023
flavorjones added a commit to sparklemotion/nokogiri that referenced this issue Dec 13, 2023
SethMMorton added a commit to SethMMorton/natsort that referenced this issue Jun 10, 2024
Fixes the problem outlined in vmactions/freebsd-vm#74.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants