Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply: rollback setting active profile if switch-to-configuration.pl fails #49

Open
water-sucks opened this issue Jan 1, 2025 · 1 comment · May be fixed by #50
Open

apply: rollback setting active profile if switch-to-configuration.pl fails #49

water-sucks opened this issue Jan 1, 2025 · 1 comment · May be fixed by #50
Assignees
Labels
bug Something isn't working

Comments

@water-sucks
Copy link
Owner

What Happened?

This is more of a niche situation, but it's possible to run into.

If running switch-to-configuration.pl on activation fails, the profile remains active. This is broken behavior in the vanilla nixos-rebuild as well that got copied over one-to-one when doing the initial write in Zig.

This has the potential to lead to a situation where the only generation that remains on a machine is a broken generation that has not been applied yet, and a subsequent reboot will lead to an unusable system with no generation to rollback to.

This can be a rather serious issue due to no space on an EFI partition when copying the kernel there (which is required when using ZFS encryption or LUKS, as a common example).

As an example, when I was adding sops-nix to my configuration, I attempted to set my user passwords with it. I did this incorrectly at first, but did not know because I had to clean all my generations first before activation due to the aforementioned space issue. I cleaned all the generations, since this was a normal issue, and ended up not being able to sudo due to the password hash being incorrectly set. This made switching generations impossible, since that requires root access. Rebooting only made this worse, due to even login now being impossible because of incorrectly set password hashes. I had to run nixos install from an external live USB with a former working config to get things working again.

How To Reproduce

  1. Attempt to apply a broken configuration (aka switch-to-configuration.pl fails to run)
  2. Clean all generations using sudo nix-collect-garbage -d or nixos generation delete --all
  3. The only remaining generation is the broken one, despite not being activated. Rollback is now impossible.

Expected Behavior

Rollback should have been possible; instead of a failed switch-to-configuration.pl leaving the broken active profile in place, the active profile should be rolled back to the previous working one.

Features

nixos 0.12.0-dev

git rev: 2d44c5c8a077f778b96fd1f213611786144d9a72
zig version: 0.13.0
optimisation mode: ReleaseSafe

Compilation Options
-------------------
flake           :: true
nixpkgs_version :: 24.05
@water-sucks water-sucks added the bug Something isn't working label Jan 1, 2025
@water-sucks water-sucks self-assigned this Jan 19, 2025
@water-sucks water-sucks linked a pull request Jan 19, 2025 that will close this issue
@water-sucks
Copy link
Owner Author

water-sucks commented Jan 22, 2025

Make sure this will also be handled upon SIGTERM and forceful cancellations. This has not yet been done at the time of writing, will update when that is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant