Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-copy-closure gets stuck in locking path #5304

Open
nh2 opened this issue Sep 28, 2021 · 6 comments
Open

nix-copy-closure gets stuck in locking path #5304

nh2 opened this issue Sep 28, 2021 · 6 comments
Labels

Comments

@nh2
Copy link
Contributor

nh2 commented Sep 28, 2021

niklas:~/ $ nix-copy-closure --from root@myhost /nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0 -vvvvvvv
...
debug1: Sending command: nix-store --serve --write
locking this thread to CPU 0
copying 1 paths...
starting pool of 1 threads
querying remote host 'root@myhost' for info on '/nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0'
copying path '/nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0' from 'ssh://root@myhost'...
acquiring global GC lock '/nix/var/nix/gc.lock'
acquiring read lock on '/nix/var/nix/temproots/11537'
acquiring write lock on '/nix/var/nix/temproots/11537'
downgrading to read lock on '/nix/var/nix/temproots/11537'
locking path '/nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0'

After this, there is no further output, and nix-copy-closure hangs forever.

Steps To Reproduce

Unsure, this is the first time I see it, but

Expected behavior

nix-env (Nix) 2.3.15 on NixOS 21.05

@nh2 nh2 added the bug label Sep 28, 2021
@nh2
Copy link
Contributor Author

nh2 commented Sep 28, 2021

journalctl -fu nix-daemon.service shows:

Sep 28 17:18:29 t25 nix-daemon[328803]: accepted connection from pid 10271, user niklas (trusted)

And then no further output.

Doing Ctrl+C on the nix-copy-closure results in daemon output:

Sep 28 17:20:57 t25 nix-daemon[10276]: 5 operations
Sep 28 17:20:57 t25 nix-daemon[10276]: unexpected Nix daemon error: writing to file: Broken pipe

@nh2
Copy link
Contributor Author

nh2 commented Sep 28, 2021

systemctl restart nix-daemon.service did not help.

Perhaps relevant:

# journalctl -eu nix-daemon.service
...
Sep 28 17:21:09 t25 systemd[1]: Stopping Nix Daemon...
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Succeeded.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4173552 (nix-daemon) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4173570 (ssh) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4353 (nix-daemon) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4364 (ssh) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 8841 (nix-daemon) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 8879 (ssh) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: Stopped Nix Daemon.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Consumed 48.865s CPU time, no IO, received 970.3M IP traffic, sent 19.7M IP traffic.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4173552 (nix-daemon) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4173570 (ssh) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4353 (nix-daemon) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4364 (ssh) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 8841 (nix-daemon) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 8879 (ssh) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: Started Nix Daemon.

Rebooting the machine helped.

After doing that, nix works as normal:

locking path ...
lock acquired on ...

Thus I suspect something gets stuck in a way that systemd could not kill it.

@stale
Copy link

stale bot commented Apr 16, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Apr 16, 2022
@nrdxp
Copy link

nrdxp commented Oct 30, 2022

possibly a duplicate of #3017

@squalus
Copy link
Member

squalus commented Aug 11, 2024

I just hit this on Nix 2.18.5. I had to manually delete the lock files in the Nix store.

@stale stale bot removed the stale label Aug 11, 2024
@Philipp-M
Copy link
Member

I have a strong feeling, that acquired file-locks are not cleaned up, when SIGINT is sent. I.e. hitting Ctrl-C while it's transferring a file, and then running it again, it's deadlocked as the lock still exists, maybe this is the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants