nixos/grub: generate BLS entries#95901
Conversation
|
Can't or shouldn't we re-use the same generator for all BLS entries generation, rather than maybe accidentally having diverging implementations? |
|
Uhm, I'm not sure: it certainly make sense but the systemd-boot generator is in python. Using it would make the bootloader script depends on a shell, perl and python interpreter, which is not great. If install-grub.pl were to be rewritten in python, it would be trivial, but I don't have enough familiarity with either perl or grub to do that. |
|
It would make the bootloader script depend on python, but isn't NixOS' base closure already bringing in the same python? I guess that would be the question to answer. Because, sure it would increase this specific NixOS module's closure size, but re-implementing the code increases the code complexity. That is my main concern with the PR, but I guess it could be fine to go with the duplicated code for the time being, as long as others don't feel strongly against that. |
|
After reading #48378, if nothing changed in the mean time, python should not be in the minimal closure unless using |
|
I also have some incoming changes to Having the loader entry generation in two different languages would make changes like this much more difficult. I wonder if the aversion to python in the minimal closure has changed at all since the apparently successful refactor to enable |
|
I believe the closure concerns are entirely about size. |
|
So, It doesn't look like there is a consensus on whether perl, python or neither should be used instead. I agree it's not ideal to have the booloader code written in two different languages, but I wouldn't stall changes on this because I don't think the conflict will be resolved anytime soon. Morover both systemd-boot-builder.py and install-grub.pl are complex and rewriting or unifying them is already a huge task. |
I agree. |
Using a compiled language with static typing sounds like a good idea: the scripts are quite complex and a proper compiler could help. It would also remove the interpreter from the closure. |
That would be amazing. The bootloader script has the potential to brick everyone's system, yet with the Perl stuff it is extremely easy to make mistakes in there. In the PR I contributed to Similarly, I don't feel equipped to review this PR. As a desktop user, and owner of important server infrastructure that needs to boot reliably, I'd accept a slightly larger system any day if in turn that Perl script dissapears from my "stuff that has a high potential to screw us" risk list. If consensus can be reached for |
👍 For Python, optional static typing support ( That is not to discourage the Rust approaches (good stuff!), but if I Python is something that can be agreed on easily, I'd that immediately vs waiting for e.g. another year for a potential Rust solution to be ready (we can still switch to something better afterwards). |
|
I'm not familiar with rust, but I could help with python too. It's important to reach a consensus on the language before, though: without it all the efforts could go to waste. |
Well for compiled languages we do not have to restrict ourselves that much, but it would indeed be a good idea to agree on a canonical interpreted language while / if we have critical interpreted code present at runtime. |
|
I marked this as stale due to inactivity. → More info |
|
The GRUB BLS module (blscfg.mod) does not support ZFS. Source 1 (search in page for blscfg); therefore we must disable BLS if ZFS support is enabled. |
|
@ne9z This PR has nothing to do with grub support for BLS. It's about adding some code to generate the BLS entries when switching to a new NixOS generation. |
|
My bad. I overlooked the part that it is just mirroring GRUB entries.
I thought GRUB was being configured to boot BLS entries directly, as in
Fedora/RHEL.
Michele Guerini Rocco ***@***.***> writes:
… @ne9z This PR has nothing to do with grub support for BLS. It's about adding some code to generate the BLS entries when switching to a new NixOS generation.
--
Reply to this email directly or view it on GitHub:
#95901 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
|
@GrahamcOfBorg test grub nixos-rebuild-install-bootloader |
|
So, it's been 4 years, install-grub.pl is still here and I'm still feeling the lack of boot loader entries. Even if the script will be ported to python (or whatever else) at some point, generating the boot loader entries is very simple and I'm sure this patch will not make that work any harded that already is. So, I'm going ahead with this. I've rebased, added more tests and a release note. I'm now pretty confident that the implementation is correct. (unsurprisingly nixos-rebuild-install-bootloader times out: this test is completely insane.) |
|
The test appears to be broken with but it passes running it in the test driver with network access. |
|
This should either be reverted or fixed quickly, before it gets into release channels, as it can lead to breakage on user's systems. Test available here, please cherry-pick into the fixing PR: This new behaviour is breaking assumptions under the BLS:
(Emphasis my own.) The changes here are currently breaking multi-boot setups where a BLS-aware bootloader is handling some boot options, and a NixOS is booted with GRUB. Note that this also differs in behaviour compared to: Not re-using the same generator for all BLS entries generation has led to accidentally having diverging implementations. Additionally, this behaviour should be behind an option, so it can be disabled. It probably should be a default-false option, too. The added BLS entries will be shown in the BLS entries menu, which may not be desirable when specifically configuring NixOS for GRUB usage. |
ElvishJerricco
left a comment
There was a problem hiding this comment.
In hindsight, I reverted this PR thinking there was more wrong with it than there ended up being. That said, I do still think it was correct to revert it.
The issues with NixOS taking excessive ownership over BLS must be addressed. We cannot be deleting other OSes entries, and we cannot be deleting another OS's loader.conf for a grub that won't use it. Additionally, I think at minimum it must be possible to disable this, and I'm not convinced it's good for it to be enabled by default.
But in the bigger picture, I question the premise of the PR. Rather than implementing BLS twice, it seems more reasonable to me to have a separate module for implementing BLS that both systemd-boot and grub can use. That way they share the same logic for populating and garbage collecting BLS files.
Finally, @rnhmjoj, this PR touches a system critical component. It had been inactive for years. It hadn't been reviewed in years. And you didn't request any reviews when you recently restarted work on it. Yet you self-merged it anyway. That is out of line and IMO grounds for removal of the commit bit.
| bootPath = if cfg.mirroredBoots != [ ] | ||
| then (builtins.head cfg.mirroredBoots).path | ||
| else "/boot"; |
There was a problem hiding this comment.
This part confused me for several reasons. I thought there was an error here, but after looking closer, I see there isn't. But I do still have some nitpicks about what confused me.
- There is an assertion that mirroredBoots cannot be empty, so I don't think we need to account for that case with a bogus value.
- I think there should be a comment explaining why just any of the mirroredBoots is acceptable.
- We shouldn't reuse the same
bootPathvariable name that is already used nearby in this module
| # mark the default entry | ||
| if (readlink($link) eq $defaultConfig) { | ||
| writeFile("$bootPath/loader/loader.conf", "default $name.conf"); | ||
| } |
There was a problem hiding this comment.
FWIW, loader.conf is not actually part of BLS. That's specifically a systemd-boot thing.
But much more importantly, if another OS owns the loader.conf file on this machine, overwriting it is not acceptable. We cannot do that. We take NixOS's ownership of loader.conf as a given in systemd-boot-builder.py, but that is not a given with grub, because it doesn't use it. It is also a significant change in behavior for NixOS's existing grub support.
| # Atomically replace the BLS entries directory | ||
| my $entriesDir = "$bootPath/loader/entries"; | ||
| rename $entriesDir, "$entriesDir.bak" or die "cannot rename $entriesDir to $entriesDir.bak: $!\n"; | ||
| rename "$entriesDir.tmp", $entriesDir or die "cannot rename $entriesDir.tmp to $entriesDir: $!\n"; | ||
| rmtree "$entriesDir.bak" or die "cannot remove $entriesDir.bak: $!\n"; |
There was a problem hiding this comment.
This is similar to the loader.conf issue, but more severe. We do not necessarily own all the entries on the system. We can't just delete the whole entries directory. The systemd-boot-builder.py takes great care to only remove entries and files known to be owned by NixOS.
| if ($grubEfi eq "" && !$copyKernels) { | ||
| # workaround for https://github.com/systemd/systemd/issues/35729 | ||
| make_path("$bootPath/kernels", { mode => 0755 }); | ||
| symlink($bootspec{kernel}, "$bootPath/kernels/$kernel"); | ||
| symlink($bootspec{initrd}, "$bootPath/kernels/$initrd"); | ||
| $copied{"$bootPath/kernels/$kernel"} = 1; | ||
| $copied{"$bootPath/kernels/$initrd"} = 1; | ||
| } |
There was a problem hiding this comment.
The correctness of this is highly suspect. BLS doesn't specify how to interpret files in the tree that are symlinks to an absolute path like this. What root should be the root for resolving those symlinks? Obviously, in the case of running bootctl list or systemctl kexec on the booted NixOS system, this happens to use the system's root FS as the root. But that isn't a given in the BLS and doesn't really make sense for an arbitrary BLS boot loader.
Additionally, why must this not be an EFI setup to take this code path? You can still do copyKernels = false; on an EFI system, if $bootPath is on the root fs and efiSysMountPoint is something else.
There was a problem hiding this comment.
This is working around a bug in systemd, of course it's not "correct" according to the BLS spec.
See what I wrote in the issue I linked:
systemctl kexec expects the paths of the kernel and initrd to be relative to /run/boot-loader-entries/ and fails with File not found. Without an ESP I would expect the path to be absolute (relative to /), given the updated specification even says:
linux specifies the Linux kernel image to execute. The value is a path relative to the root of the file system containing the boot entry snippet itself.
Additionally, why must this not be an EFI setup to take this code path? You can still do copyKernels = false; on an EFI system, if $bootPath is on the root fs and efiSysMountPoint is something else.
It's unrelated to copyKernels: it has to do with linking the entries to /run/boot-loader-entries on non-EFI systems to expose the entries to systemd. In this (edge) case the BLS taken literally says to interpret paths relative to /, while systemd interprets them relative to /run/boot-loader-entries.
There was a problem hiding this comment.
I understand the reasoning for this. It's still highly suspect, and should probably not be considered proper BLS.
And yes, it does have to do with copyKernels. Try it out. I did. If you make an EFI system with /boot on the root fs and efiSysMountPoint = "/efi"; or something similar, this will produce a broken result that does not work. The BLS files will point to files that should be in the boot tree, but are not, because you disabled creating these symlinks in the boot tree for EFI systems.
IMO the feature simply shouldn't work without copyKernels enabled, because like I said, these symlinks don't make sense in the context of BLS. The fact that it doesn't work with copyKernels disabled on EFI systems is just something else that's wrong.
|
|
||
| environment.systemPackages = mkIf (grub != null) [ grub ]; | ||
|
|
||
| # Link /boot under /run/boot-loder-entries to make |
There was a problem hiding this comment.
typo: loder -> laoder
You're right. If I have to defend myself, I've become increasingly frustrated with how slow NixOS is moving lately, ragarding both pushing new features or getting feedback. It seems the only way to get some comments is by accidentally breaking someone's setup... |
No. You get feedback by requesting review from the relevant people. The systemd team would have been a start, in this case, since the purpose of this PR is to work around a systemd problem. |
I just didn't think of this work as systemd-related: the workaround is a relatively minor part. I would have appreciated feedback from people using grub, but it's officially unmmaintained. |
|
“Unmaintained package” isn’t the same thing as “free‐for‐all for self‐merges two years after anyone last looked at the PR”… This is the third or fourth time in the past few months that you’ve broken things by self‐merging PRs without anyone giving approval or even substantive review? Everyone without the commit bit has to use the review request room and thread. I don’t see how this is anything but a continued pattern of reckless abuse of committer privileges. |
Motivation for this change
See issue #94038. Basically being able to use
bootctlandsystemctl kexecwith GRUB.Things done
nixosTests.grubnixosTests.installernixos-rebuild switch