Skip to content

nixos-rebuild-ng: fix switch inside a path symlinked to nix store#417191

Closed
thiagokokada wants to merge 2 commits intoNixOS:masterfrom
thiagokokada:revert-375493
Closed

nixos-rebuild-ng: fix switch inside a path symlinked to nix store#417191
thiagokokada wants to merge 2 commits intoNixOS:masterfrom
thiagokokada:revert-375493

Conversation

@thiagokokada
Copy link
Copy Markdown
Contributor

@thiagokokada thiagokokada commented Jun 16, 2025

PR #375493 was introduced to fix an issue of different behavior between nixos-rebuild repl and nixos-rebuild switch by forcing usage of git+file:// protocol when evaluating the Flake. This sadly reintroduced an older issue from the original nixos-rebuild that is caused by a pretty nasty bug in nix:

Let's do the same fix we did for nixos-rebuild and just stopping normalizing the Flake (#153515). This will bring back the original issues this code is supposed to fix, but I argue that a difference between nixos-rebuild repl and nixos-rebuild switch is better than having a broken system.

Fix: #144811.

Before

/run/opengl-driver/lib
❯ sudo nixos-rebuild switch
[sudo] password for thiagoko:
warning: could not re-exec in a newer version of nixos-rebuild, using current version
warning:nixos_rebuild:could not re-exec in a newer version of nixos-rebuild, using current version
building the system configuration...
Failed to find executable /nix/store/ijz8g3gibqyr2zqfkrhshb5f50a46dv5-mesa-25.1.3/bin/switch-to-configuration: No such file or directory
Command '['systemd-run', '-E', 'LOCALE_ARCHIVE', '-E', 'NIXOS_INSTALL_BOOTLOADER', '--collect', '--no-ask-password', '--pipe', '--quiet', '--service-type=exec', '--unit=nixos-rebuild-switch-to-configuration', PosixPath('/nix/store/ijz8g3gibqyr2zqfkrhshb5f50a46dv5-mesa-25.1.3/bin/switch-to-configuration'), 'switch']' returned non-zero exit status 1.

After

# need to remove the broken generations first
~/Projects/nixpkgs master*
❯ sudo nix-env -p /nix/var/nix/profiles/system --delete-generations old #
removing profile version 288
removing profile version 289
/run/opengl-driver/lib
❯ sudo ~/Projects/nixpkgs/result/bin/nixos-rebuild-ng switch
building the system configuration...
activating the configuration...
showing changes compared to /run/current-system...
<<< /run/current-system
>>> /nix/store/fbgjjkgwpg6cgzyv8sq7rxvnnvgha91h-nixos-system-sankyuu-nixos-25.11.20250613.ee930f9
No version or selection state changes.
Closure size: 3679 -> 3679 (0 paths added, 0 paths removed, delta +0, disk usage +0B).
setting up /etc...
reloading user units for thiagoko...
restarting sysinit-reactivation.target
the following new units were started: libvirtd.service, NetworkManager-dispatcher.service
Done. The new configuration is /nix/store/fbgjjkgwpg6cgzyv8sq7rxvnnvgha91h-nixos-system-sankyuu-nixos-25.11.20250613.ee930f9

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • Nixpkgs 25.11 Release Notes (or backporting 24.11 and 25.05 Nixpkgs Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
  • NixOS 25.11 Release Notes (or backporting 24.11 and 25.05 NixOS Release notes)
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other contributing documentation in corresponding paths.

Add a 👍 reaction to pull requests you find important.

@thiagokokada
Copy link
Copy Markdown
Contributor Author

CC @Mic92 @tejing1 @colemickens.

@github-actions github-actions bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS labels Jun 16, 2025
@thiagokokada
Copy link
Copy Markdown
Contributor Author

thiagokokada commented Jun 16, 2025

Other options here:

  • Validate the path returned from nix build is actually a NixOS generation (suggestion from @tejing1: check for the bootspec files) and bail out if isn't.
  • Run the build commands in a temporary directory or even in /.

@tejing1
Copy link
Copy Markdown
Contributor

tejing1 commented Jun 16, 2025

  • Running the commands in a temp dir or / seems prone to error due to needing to convert relative flake urls to absolute in every case where a flake url can somehow be related to the pwd. I doubt you'd get all the cases correct.
  • Validating that the path looks roughly like a nixos generation seems like a decent thing to do anyway, even if only to deal with cases where people exposed strange things under nixosConfigurations in their flake while flailing around trying to understand flakes or just poking things to see what happens. More effort than a simple reversion, though.
  • Yet another alternative would be to fix the systemd-boot menu builder (and any others affected) to fail gracefully when given weird generations. Also more effort than a reversion... and also probably a good idea regardless.

@thiagokokada
Copy link
Copy Markdown
Contributor Author

  • Running the commands in a temp dir or / seems prone to error due to needing to convert relative flake urls to absolute in every case where a flake url can somehow be related to the pwd. I doubt you'd get all the cases correct.

I don't think it is that difficult to be honest. From what I understand it is just the nix build command that needs to be run inside a temporary directory, and this can be done easily in subprocess.run by setting cwd parameter.

I gave it a try here: #417203.

Validating that the path looks roughly like a nixos generation seems like a decent thing to do anyway, even if only to deal with cases where people exposed strange things under nixosConfigurations in their flake while flailing around trying to understand flakes or just poking things to see what happens. More effort than a simple reversion, though.

I want to avoid depending more in the NixOS implementation details to be honest.

Yet another alternative would be to fix the systemd-boot menu builder (and any others affected) to fail gracefully when given weird generations. Also more effort than a reversion... and also probably a good idea regardless.

This probably still needs to be done either way.

@tejing1
Copy link
Copy Markdown
Contributor

tejing1 commented Jun 16, 2025

Honestly, "does the symlink I got back have a boot.json file in it? If not, bail." still seems like the best ease vs. cleanliness tradeoff here to me.

@thiagokokada
Copy link
Copy Markdown
Contributor Author

Honestly, "does the symlink I got back have a boot.json file in it? If not, bail." still seems like the best ease vs. cleanliness tradeoff here to me.

Do we always have a bootspec though?

And if you want to implement this, sure, I find no issue. I am not really familiar with the bootspec to know what to do here.

PR NixOS#375493 was introduced to fix an issue of different behavior between
`nixos-rebuild repl` and `nixos-rebuild switch` by forcing usage of
`git+file://` protocol when evaluating the Flake. This sadly
reintroduced an older issue from the original `nixos-rebuild` that is
caused by a pretty nasty bug in `nix`:
- NixOS#144811

Let's do the same fix we did for `nixos-rebuild` and just stopping
normalizing the Flake (NixOS#153515).
This will bring back the original issues this code is supposed to fix,
but I argue that a difference between `nixos-rebuild repl` and
`nixos-rebuild switch` is better than having a broken system.
@thiagokokada
Copy link
Copy Markdown
Contributor Author

thiagokokada commented Jun 16, 2025

IMO, I would much prefer to revert this change first to avoid causing more issues and I am still not completely clear the benefits of this change (I understand that it has some difference in behavior from nixos-rebuild repl and nixos-rebuild switch, but I don't think lots of people use nixos-rebuild repl to that to matter anyway).

Also reverting this change matches the behavior of old nixos-rebuild that for the last 4 years at least didn't seem to cause any further issue.

Just checking for the bootspec files or anything else will still cause the bad behavior of nixos-rebuild switch just randomly failing inside some directories, the only change is that it will not break the whole system. I would prefer to make sure that nixos-rebuild always works.

@tejing1
Copy link
Copy Markdown
Contributor

tejing1 commented Jun 16, 2025

Do we always have a bootspec though?

I guess not. There is an internal option to disable it, so there is at least some degree of expectation that it may not exist:
https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/activation/bootspec.nix#L101-L107

It sure looks like nixos-version is dependable, though: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/activation/top-level.nix#L42

Also reverting this change matches the behavior of old nixos-rebuild that for the last 4 years at least didn't seem to cause any further issue.

I had the idea this was fixing something a little more pervasive. Sounds like I misunderstood by a bit.

Just checking for the bootspec files or anything else will still cause the bad behavior of nixos-rebuild switch just randomly failing inside some directories, the only change is that it will not break the whole system. I would prefer to make sure that nixos-rebuild always works.

Fair point. Though I do think if the error message mentions it may be caused by running from the nix store it's not much of an issue.

@thiagokokada
Copy link
Copy Markdown
Contributor Author

I had the idea this was fixing something a little more pervasive. Sounds like I misunderstood by a bit.

This is why I want the input of @Mic92 before merging this PR, but AFAIK it is only the difference between repl and switch.

flake_str: str,
hostname_fn: Callable[[], str | None] = lambda: None,
) -> Self:
def parse(cls, flake_str: str, target_host: Remote | None = None) -> Self:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: sorry for adding a refactor for a supposedly revertion PR only but this code was really bothering me.

I understand why I wrote it that way (I wanted to make hostname computation to be lazy since it can be somewhat expensive, especially in the --target-host case), but I ended up creating a code that can only be summed up as "it was probably wrote by a drunk person", and I don't even drink alcohol.

If reviewing the changes is difficult, please review by commit, it should be easier to understand what is going on.

@thiagokokada
Copy link
Copy Markdown
Contributor Author

Replaced by #418243.

@thiagokokada thiagokokada deleted the revert-375493 branch June 19, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nixos-rebuild is doing something bizarre if I run it from /run/opengl-driver/lib

2 participants