-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pacman] getting stuck on clang-aarch64 #4340
Comments
Does the CI script terminate msys2 processes after update? See https://github.com/msys2/setup-msys2/blob/8b0d40b8912601756301a7b3de7752d5dba969cd/main.js#L408 In that main.js file, |
There is also a bug (presumably in msys2-runtime), which I have never been able to debug, that manifests as processes hanging around when they should have exited. This usually seems to happen when pacman uses gpgme to attempt to validate signatures. As a workaround, I disable the validation of database signatures in pacman.conf, because the database signature verification seems to happen every time pacman is run, whereas package verification is only done when a package is being installed. I still see occasional hangups in package verification though. See also msys2/msys2-autobuild#62 which I think is the closest thing to an existing bug tracking this. Workarounds I currently apply: REM https://github.com/msys2/msys2-autobuild/issues/62
CALL C:\msys64\msys2_shell.cmd -defterm -no-start -c "mkdir -p /etc/pacman.d/hooks && touch /etc/pacman.d/hooks/texinfo-{install,remove}.hook"
REM the caret is messing with CMD parsing, try it another way
C:\msys64\usr\bin\sed.exe -i -e 's/^^\(SigLevel\s\+=\s\+Required\)\s*$/\1 DatabaseNever/' /etc/pacman.conf |
We are now doing something similar, which seems to get us further, but it's finding a database lock. On my own Arm Dev Kit, I just updated pacman, which went without problems. However, it is now stuck in compiling part of GIMP (I think I've seen this before too). So, this getting stuck is probably not specific to pacman. Looking in process explorer, the innermost process is env.exe. Could it be related to reading/setting env vars, which I seem to remember can have problems from multiple threads. |
The database lock file |
Hi. The database lock is being investigated externally (this is not a MSYS2 bug). The problem is, when the database is not a concern, we cann't kill pacman easily. See: https://gitlab.gnome.org/GNOME/gimp/-/jobs/3458478#L157 |
The msys2 related processes should be terminated outside of msys2 environment. For example, |
I tried before with takkill but the exit code makes the job fail. |
OK, I am out of ideas then. By the way, @hmartinez82 has done some great work of porting apps to aarch64. He may suggest some ideas. |
I wish somebody with good knowledge of low-level debugging on WoA (and/or of Cygwin) could debug this, I have tried and had no luck (I always got an error getting the context of the main thread, from every debugger I tried: windbg, gdb, lldb). |
I even see this happening, randomly, in my personal laptop when using pacman. @Biswa96 I'm not low level debugger. Actually I'm thinking about installing Tailscale in that VM and letting someone else with more expertise take a look. |
Hi all! Now we have new runners contributed by Arm Ltd., additionally to the one by @hmartinez82. And this pacman getting stuck issue is also happening randomly on their runners. Is there anything we could tell the admins at Arm to look for in order to help you debug this issue? |
@Jehan I'm glad they are having it too, so we now know it's not just my runner. I don't know what the issue is. |
MSYS2 pacman gets randomly stuck on Windows/Aarch64. The actual issue is still being investigated by upstream projects, though anyway it's bad for us right now, to the point that there are discussions to remove Aarch64 support from the Windows installer (whereas it just got added recently!) in #10729. This is an attempt to a workaround. Instead of getting stuck forever and waiting until the whole job times out (per Gitlab CI settings), I time-out the pacman command within our script and try again, up to 2 more times. Hopefully one of the calls would succeed. See: msys2/MSYS2-packages#4340
MSYS2 pacman gets randomly stuck on Windows/Aarch64. The actual issue is still being investigated by upstream projects, though anyway it's bad for us right now, to the point that there are discussions to remove Aarch64 support from the Windows installer (whereas it just got added recently!) in #10729. This is an attempt to a workaround. Instead of getting stuck forever and waiting until the whole job times out (per Gitlab CI settings), I time-out the pacman command within our script and try again, up to 2 more times. Hopefully one of the calls would succeed. See: msys2/MSYS2-packages#4340
MSYS2 pacman gets randomly stuck on Windows/Aarch64. The actual issue is still being investigated by upstream projects, though anyway it's bad for us right now, to the point that there are discussions to remove Aarch64 support from the Windows installer (whereas it just got added recently!) in #10729. This is an attempt to a workaround. Instead of getting stuck forever and waiting until the whole job times out (per Gitlab CI settings), I time-out the pacman command within our script and try again, up to 2 more times. Hopefully one of the calls would succeed. See: msys2/MSYS2-packages#4340
MSYS2 pacman gets randomly stuck on Windows/Aarch64. The actual issue is still being investigated by upstream projects, though anyway it's bad for us right now, to the point that there are discussions to remove Aarch64 support from the Windows installer (whereas it just got added recently!) in #10729. This is an attempt to a workaround. Instead of getting stuck forever and waiting until the whole job times out (per Gitlab CI settings), I time-out the pacman command within our script and try again, up to 2 more times. Hopefully one of the calls would succeed. See: msys2/MSYS2-packages#4340
MSYS2 pacman gets randomly stuck on Windows/Aarch64. The actual issue is still being investigated by upstream projects, though anyway it's bad for us right now, to the point that there are discussions to remove Aarch64 support from the Windows installer (whereas it just got added recently!) in #10729. This is an attempt to a workaround. Instead of getting stuck forever and waiting until the whole job times out (per Gitlab CI settings), I time-out (after 3 minutes) the pacman command within our script and try again, up to 2 more times. Hopefully one of the calls would succeed. I also send a SIGKILL through the timeout (though I have no idea how signals translate to Windows processes) and run again taskkill after this, which may seem overkill. Interestingly I get output for both, which seems to indicate that the kill succeeds in both cases (because of several processes?). Anyway clearly it's a bit of random code not completely understood, but the inability to test this all locally clearly doesn't help so it's good enough for the time being. See: msys2/MSYS2-packages#4340
…64 jobs. This is the command suggest by MSYS2 developers here: msys2/MSYS2-packages#4340 (comment) They also say to run it outside the MSYS2 environment, which is why it's in the CI rules, not in the shell script.
…64 jobs. This is the command suggest by MSYS2 developers here: msys2/MSYS2-packages#4340 (comment) They also say to run it outside the MSYS2 environment, which is why it's in the CI rules, not in the shell script. Honestly at this point, it feels like we are just stacking weird workaround to get it to fail not too often. ;-(
MSYS2 pacman gets randomly stuck on Windows/Aarch64. The actual issue is still being investigated by upstream projects, though anyway it's bad for us right now, to the point that there are discussions to remove Aarch64 support from the Windows installer (whereas it just got added recently!) in #10729. This is an attempt to a workaround. Instead of getting stuck forever and waiting until the whole job times out (per Gitlab CI settings), I time-out (after 3 minutes) the pacman command within our script and try again, up to 2 more times. Hopefully one of the calls would succeed. I also send a SIGKILL through the timeout (though I have no idea how signals translate to Windows processes) and run again taskkill after this, which may seem overkill. Interestingly I get output for both, which seems to indicate that the kill succeeds in both cases (because of several processes?). Anyway clearly it's a bit of random code not completely understood, but the inability to test this all locally clearly doesn't help so it's good enough for the time being. See: msys2/MSYS2-packages#4340
…64 jobs. This is the command suggest by MSYS2 developers here: msys2/MSYS2-packages#4340 (comment) They also say to run it outside the MSYS2 environment, which is why it's in the CI rules, not in the shell script. Honestly at this point, it feels like we are just stacking weird workaround to get it to fail not too often. ;-(
Description / Steps to reproduce the issue
Since about a month GIMP's aarch64 CI runner is getting stuck when running
pacman --noconfirm -Suy
.The last job that succeeded was Dec 11 and another from the same day and any later one is failing.
Most of the time (e.g. here) it already seems to stop before the databases are updated:
Sometimes it gets a little further:
When testing on my Ms Dev kit now it did not get stuck (but I do remember seeing that sometimes in the past). However, when checking with Process Explorer, I do see that after pacman closed the terminal, the pacman and conhost processes are still running.
Expected behavior
Pacman finishes after doing its thing.
Actual behavior
Pacman gets stuck
Verification
Windows Version
MSYS64_NT-10.0-22621
Are you willing to submit a PR?
No response
The text was updated successfully, but these errors were encountered: