Skip to content

Comments

qemu, runInLinuxVM: fix KVM availability check#125948

Merged
alyssais merged 3 commits intoNixOS:masterfrom
mschwaig:fix-check-for-kvm-device
Nov 30, 2021
Merged

qemu, runInLinuxVM: fix KVM availability check#125948
alyssais merged 3 commits intoNixOS:masterfrom
mschwaig:fix-check-for-kvm-device

Conversation

@mschwaig
Copy link
Member

@mschwaig mschwaig commented Jun 6, 2021

KVM should only be considered abailable if /dev/kvm exists and
is read-writable by the user that is trying to launch it.

The previous check for existance only had the consequence that
on some Linux distributions running VMs with Nix's QEMU only worked
if KVM was NOT installed.

See issue: #124371


To test this change I

  1. built some NixOS Test driver on x86_64 NixOS.
  2. changed the permissions for /dev/kvm chmod o-rw /dev/kvm
  3. ran result/bin/nixos-run-vms

Without this change that test failed right away with a log like this:

./nixos-run-vms
starting VDE switch for network 1
deleting VM state directory /tmp/vm-state-pia
if you want to keep the VM state, pass --keep-vm-state
deleting VM state directory /tmp/vm-state-satellite
if you want to keep the VM state, pass --keep-vm-state
deleting VM state directory /tmp/vm-state-sensor
if you want to keep the VM state, pass --keep-vm-state
running the VM test script
starting all VMs
pia: starting vm
pia # Formatting '/tmp/vm-state-pia/pia.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=536870912 lazy_refcounts=off refcount_bits=16
pia # Could not access KVM kernel module: Permission denied
pia # qemu-system-x86_64: failed to initialize kvm: Permission denied
pia: QEMU running (pid 2926214)
satellite: starting vm
satellite # Formatting '/tmp/vm-state-satellite/satellite.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=536870912 lazy_refcounts=off refcount_bits=16
satellite # Could not access KVM kernel module: Permission denied
satellite # qemu-system-x86_64: failed to initialize kvm: Permission denied
satellite: QEMU running (pid 2926260)
sensor: starting vm
sensor # Formatting '/tmp/vm-state-sensor/sensor.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=536870912 lazy_refcounts=off refcount_bits=16
sensor # Could not access KVM kernel module: Permission denied
sensor # qemu-system-x86_64: failed to initialize kvm: Permission denied
sensor: QEMU running (pid 2926271)
(1.33 seconds)
waiting for all VMs to finish
pia: waiting for the VM to power off
(0.00 seconds)
satellite: waiting for the VM to power off
(0.00 seconds)
sensor: waiting for the VM to power off
(0.00 seconds)
(0.00 seconds)
(1.33 seconds)
test script finished in 1.33s
cleaning up
(0.00 seconds)

while with the change the test proceeded without hardware acceleration via KVM.

I have also tested that this fixes the same issue when running QEMU inside the build sandbox as part of a NixOS Test. The NixOS Test checkbox is unchecked though, because there there is automatic NixOS Tests for the new behavior.

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • 21.11 Release Notes
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot requested review from alyssais and edolstra June 6, 2021 12:29
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Jun 6, 2021
@mschwaig mschwaig marked this pull request as draft June 6, 2021 12:32
@mschwaig mschwaig changed the base branch from staging to master June 6, 2021 13:57
@mschwaig mschwaig marked this pull request as ready for review June 6, 2021 14:24
@r-rmcgibbo
Copy link

r-rmcgibbo commented Jun 6, 2021

Result of nixpkgs-review pr 125948 at d22a9ecf run on aarch64-linux 1

4 packages marked as broken and skipped:
  • qemu_xen
  • qemu_xen-light
  • qemu_xen_4_10
  • qemu_xen_4_10-light
2 packages skipped due to time constraints:
  • multibootusb
  • qemu_full
18 packages built successfully:
  • alpine-make-vm-image
  • aqemu
  • cloud-init
  • cloud-utils
  • cot (python38Packages.cot)
  • gnome.gnome-boxes
  • libguestfs
  • out-of-tree
  • python38Packages.guestfs
  • python39Packages.cot
  • python39Packages.guestfs
  • qemu
  • qemu-utils
  • qemu_kvm
  • qemu_test
  • qtemu
  • solo5
  • vagrant

Result of nixpkgs-review pr 125948 at d22a9ecf run on x86_64-linux 1

2 packages marked as broken and skipped:
  • linuxPackages_hardkernel_4_14.virtualbox
  • linuxPackages_hardkernel_latest.virtualbox
1 package failed to build:
25 packages skipped due to time constraints:
  • gnome.gnome-boxes
  • linuxPackages-libre.virtualbox
  • linuxPackages.virtualbox (linuxPackages_5_10.virtualbox)
  • linuxPackages_4_14.virtualbox
  • linuxPackages_4_19.virtualbox
  • linuxPackages_4_4.virtualbox
  • linuxPackages_4_9.virtualbox
  • linuxPackages_5_11.virtualbox
  • linuxPackages_5_12.virtualbox (linuxPackages_latest.virtualbox)
  • linuxPackages_5_4.virtualbox
  • ...
19 packages built successfully:
  • alpine-make-vm-image
  • cloud-init
  • cloud-utils
  • cot (python38Packages.cot)
  • diffoscope
  • libguestfs
  • open-watcom-bin
  • out-of-tree
  • python38Packages.guestfs
  • python39Packages.cot
  • python39Packages.guestfs
  • qemu
  • qemu-utils
  • qemu_kvm
  • qemu_test
  • qemu_xen (qemu_xen_4_10)
  • qemu_xen-light (qemu_xen_4_10-light)
  • solo5
  • vagrant
3 suggestions:
  • warning: missing-phase-hooks

    installPhase should probably contain runHook preInstall and runHook postInstall.

    Near pkgs/applications/virtualization/qemu/utils.nix:11:3:

       |
    11 |   installPhase = ''
       |   ^
    
  • warning: name-and-version

    Did you mean to pass pname instead of name to mkDerivation?

    Near pkgs/applications/virtualization/qemu/utils.nix:4:3:

      |
    4 |   name = "qemu-utils-${version}";
      |   ^
    

    Near pkgs/applications/virtualization/qemu/utils.nix:5:3:

      |
    5 |   version = qemu.version;
      |   ^
    
  • warning: build-tools-in-build-inputs

    makeWrapper is a build tool so it likely goes to nativeBuildInputs, not buildInputs.

    Near pkgs/applications/virtualization/qemu/default.nix:57:3:

       |
    57 |   buildInputs =
       |   ^
    

Note that build failures may predate this PR, and could be nondeterministic or hardware dependent.
Please exercise your independent judgement. Does something look off? Please file an issue or reach out on IRC.

@lukegb
Copy link
Contributor

lukegb commented Jun 6, 2021

My gut feeling is that the wrapper should probably warn on stderr if KVM isn't available. It's a bit weird having a wrapper called "qemu-kvm" that doesn't actually use kvm, in any case, but that's the situation we're in :P

I guess qemu-maybe-kvm-if-it-seems-available is too long ;)

@mschwaig
Copy link
Member Author

mschwaig commented Jun 7, 2021

I can try emitting a warning on stderr from the wrapper and check how visible that is later this week.


It seems the right component to make sure people know what's going on with KVM would actually be Nix and the handling of the kvm system feature there.

We could make sure it works and takes the permissions of the build user into account. This seems to be the code for detecting the system feature:
https://github.com/NixOS/nix/blob/7c90552879da4d1df99b50c85e94201981e60123/src/libstore/globals.cc#L128-L130

I have not tested this yet because somehow my VM without /dev/kvm also has the kvm feature.

I think the way to enable non-accelreated execution should be manually adding the kvm system feature. That's already how it normally works right now, execpt for the edge case addressed in this PR.

To make things more clear to users

  • assuming the kvm system feature check works and
  • determines you don't have a usable KVM

it could explicitly prompt you to either

  • install KVM and set the permissions in a workable way so Nix picks it up or
  • add it to the system features manually and accept the performance pentalty.

That way at least as part of a build things only take the slower route if you set some deliberate action after a prompt that informs you about the consequences.

(There is also some code which suggests that supplementary groups should actually stick to the build users here:
https://github.com/NixOS/nix/blob/b10256af51dfa929e8f916414d6f021dd45f2e1e/src/libstore/build/local-derivation-goal.cc#L1901-L1902, but from testing while looking into #124371 it seems that this is at least not working for all confirgurations of the build sandbox.)

I'm thinking about opening an issue about this whole kvm system feature topic on Nix if people think that makes sense.
Some relevant discussion about how that feature is handeled: NixOS/nix#2964 (comment)

Copy link
Member

@puffnfresh puffnfresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an improvement on the current situation

@Artturin
Copy link
Member

Artturin commented Sep 6, 2021

I can try emitting a warning on stderr from the wrapper and check how visible that is later this week.

Planning on finishing this?

--run could be useful here

# --run COMMAND : run command before the executable

@mschwaig
Copy link
Member Author

mschwaig commented Nov 5, 2021

I can try emitting a warning on stderr from the wrapper and check how visible that is later this week.

Planning on finishing this?

--run could be useful here

# --run COMMAND : run command before the executable

Thanks for the hint. I was unsure how to best approach this and got busy with other things and your comment was very helpful.

I added warnings now.

@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Nov 5, 2021
@mschwaig mschwaig force-pushed the fix-check-for-kvm-device branch from dd8887d to ad9a113 Compare November 5, 2021 21:45
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Nov 5, 2021
@mschwaig
Copy link
Member Author

mschwaig commented Nov 6, 2021

It seems the slow path of using QEMU without KVM when /dev/kvm is not read-/writable is generally broken right now.

On my machine QEMU now gets stuck at

# Booting from ROM...
# Probing EDD (edd=off to disable)... ok

This was working fine yesterday but I can't reproduce what changed.
Everything's also working fine when KVM is available.

The following change fixes it.

diff --git a/nixos/lib/qemu-common.nix b/nixos/lib/qemu-common.nix
index 84f9060acd6..ebcdd61f587 100644
--- a/nixos/lib/qemu-common.nix
+++ b/nixos/lib/qemu-common.nix
@@ -22,7 +22,7 @@ rec {
         else throw "Unknown QEMU serial device for system '${pkgs.stdenv.hostPlatform.system}'";
 
   qemuBinary = qemuPkg: {
-    x86_64-linux = "${qemuPkg}/bin/qemu-kvm -cpu max";
+    x86_64-linux = "${qemuPkg}/bin/qemu-kvm -cpu qemu64";
     armv7l-linux = "${qemuPkg}/bin/qemu-system-arm -enable-kvm -machine virt -cpu host";
     aarch64-linux = "${qemuPkg}/bin/qemu-system-aarch64 -enable-kvm -machine virt,gic-version=host -cpu host";
     powerpc64le-linux = "${qemuPkg}/bin/qemu-system-ppc64 -machine powernv";

It's similar to what's reported in #141596 (comment)

@alyssais
Copy link
Member

@mschwaig has anybody made a PR for that? LGTM — I doubt we have many CPU-bound NixOS tests, and that's all qemu-common is used for AFAICT.

@mschwaig
Copy link
Member Author

It looks like runInLinuxVM and make-disk-image.nix use qemu-common.nix. Probably not CPU-bound but still I think access to KVM could make a significant difference there for performance there and for NixOS tests. I don't think a smaller set of available instructions would matter.

Would switching to qemu64 disable KVM acceleration or just reduce what instructions are accelerated/supported?

I am not aware of another PR that changes the emulated CPU.

If it just reduces the set of accelerated or supported instructions, I can add a commit for that here and I think we should merge this PR. Otherwise I'm not sure how to proceed.

@mschwaig
Copy link
Member Author

In the future I would also like to test this with your qemu-6.2.0 branch @alyssais (#146526) to see if 6.2 will fix qemu getting stuck, so we could eventually revert to -cpu max in a future PR, but somehow I cannot fetch that branch from your fork right now.

@mschwaig
Copy link
Member Author

mschwaig commented Nov 22, 2021

Working on this made me think it would make sense for a script that gets invoked before qemu to not only emit warnings but actually determine with which -cpu setting qemu gets invoked. For builds outside a VM the supported instruction has an impact on the output anyways, and -max already makes a machine-dependent determination of the emulated CPU based on the host CPU. If we made that determination ourselves we could

  • write to the log what CPU was emulated,
  • account for other factors, like access to /dev/kvm and
  • tell qemu more specifically what to emulate.

The XML for describing CPUs and host-model concept from libvirt (as seen in this presentation) look interesting for that.

@alyssais
Copy link
Member

alyssais commented Nov 22, 2021 via email

@alyssais
Copy link
Member

alyssais commented Nov 22, 2021 via email

@mschwaig
Copy link
Member Author

That's very strange. Does git fetch https://github.com/NixOS/nixpkgs pull/146526/head work?

Yes, that worked. Your commit from qemu-6.2.0 (ded05e005c2f359d79cc88401c98699f0f92ecbd) prevents VMs from getting stuck on bootup when /dev/kvm has the wrong permissions, it seems qemu 6.2.0 fixes that issue.

Since the release date for qemu will be December 7 at the earliest I will add a commit here to change change -cpu to qemu64 which can then be reverted as part of #146526.

That workaround should fix #141596 as well and hopefully qemu 6.2.0 fixes the actual problem there as well.

@github-actions github-actions bot added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Nov 22, 2021
Copy link
Member

@alyssais alyssais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important change here is fixing the definition of WARNCOL, but I think everything else I've pointed out is also worth considering.

@alyssais alyssais mentioned this pull request Nov 23, 2021
14 tasks
@alyssais
Copy link
Member

Looks good on a first glance, but I'm too tired to say for sure tonight and will have a closer look tomorrow. Could you squash all the warning stuff into a single commit, and add appropriate "Fixes:" lines to your commit messages please?

KVM should only be considered abailable if /dev/kvm exists and
is read-writable by the user that is trying to launch it.

The previous check for existance only had the consequence that
on some Linux distributions running VMs with Nix's QEMU only worked
if KVM was NOT installed.

fixes NixOS#124371
@mschwaig mschwaig force-pushed the fix-check-for-kvm-device branch 2 times, most recently from 9b75f9d to c9d7163 Compare November 25, 2021 12:41
@mschwaig
Copy link
Member Author

Looks good on a first glance, but I'm too tired to say for sure tonight and will have a closer look tomorrow. Could you squash all the warning stuff into a single commit, and add appropriate "Fixes:" lines to your commit messages please?

I have reordered and squashed the commits so there is one commit per topic, added which issues they fix and moved the emitted warning from stdout to stderr.

Copy link
Member

@alyssais alyssais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I've now tested this and am happy with it apart from this small terminology issue.

This fixes the qemu-kvm wrapper we add for convenience
silently not using KVM, when the system would support it
by at least leaving an indication in the log that the build ran
slower because it ran without KVM.
The flag -cpu max leaves QEMU 6.1.0 stuck on some systems,
for example when /dev/kvm is not read-writable.
This does not happen with -cpu qemu64.

Getting stuck like that is a regression in 6.1.0 not yet present in 6.0.0
and should be fixed with 6.2.0 according to early testing with rc1.

We should consider reverting this change when we merge QEMU 6.2.0.
See NixOS#146526.

fixes NixOS#141596
@mschwaig mschwaig force-pushed the fix-check-for-kvm-device branch from c9d7163 to abbe8cb Compare November 30, 2021 10:25
@mschwaig
Copy link
Member Author

@alyssais No worries, I have updated the terminology. I think it's better now.

@ofborg ofborg bot requested a review from alyssais November 30, 2021 10:37
@alyssais alyssais merged commit af180d5 into NixOS:master Nov 30, 2021
@github-actions
Copy link
Contributor

Backport failed for release-21.05, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-21.05
git worktree add -d .worktree/backport-125948-to-release-21.05 origin/release-21.05
cd .worktree/backport-125948-to-release-21.05
git checkout -b backport-125948-to-release-21.05
ancref=$(git merge-base 716815ce2a1fcb135843c7441648a59d62fb6eb6 abbe8cbc4843c213947abef70eb11f10ebea96f1)
git cherry-pick -x $ancref..abbe8cbc4843c213947abef70eb11f10ebea96f1

@github-actions
Copy link
Contributor

Backport failed for release-21.05, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-21.05
git worktree add -d .worktree/backport-125948-to-release-21.05 origin/release-21.05
cd .worktree/backport-125948-to-release-21.05
git checkout -b backport-125948-to-release-21.05
ancref=$(git merge-base 716815ce2a1fcb135843c7441648a59d62fb6eb6 abbe8cbc4843c213947abef70eb11f10ebea96f1)
git cherry-pick -x $ancref..abbe8cbc4843c213947abef70eb11f10ebea96f1

@github-actions
Copy link
Contributor

Successfully created backport PR #148015 for release-21.11.

@stephank
Copy link
Contributor

stephank commented Dec 2, 2021

These changes broke the Darwin build, but simply because there's no libexec: https://hydra.nixos.org/build/159925938/nixlog/1

cp: cannot create regular file '/nix/store/60s7gdxrrlyn8dm7d73v6r439v798kkc-qemu-6.1.0/libexec/emit-kvm-warnings': No such file or directory

However, I wonder what qemu-kvm should do at all on Darwin. If our tests depend on it, and we want to maybe support running tests on Darwin in the future, the wrapper should stay and do something useful, I think?

I believe -enable-kvm is the same as -accel kvm? Maybe the wrapper can do -accel hvf on Darwin? Regardless, the warning should be disabled on Darwin. Telling users to install KVM makes no sense.

@Mic92 Mic92 mentioned this pull request Dec 2, 2021
13 tasks
@Mic92
Copy link
Member

Mic92 commented Dec 2, 2021

These changes broke the Darwin build, but simply because there's no libexec: https://hydra.nixos.org/build/159925938/nixlog/1

cp: cannot create regular file '/nix/store/60s7gdxrrlyn8dm7d73v6r439v798kkc-qemu-6.1.0/libexec/emit-kvm-warnings': No such file or directory

However, I wonder what qemu-kvm should do at all on Darwin. If our tests depend on it, and we want to maybe support running tests on Darwin in the future, the wrapper should stay and do something useful, I think?

I believe -enable-kvm is the same as -accel kvm? Maybe the wrapper can do -accel hvf on Darwin? Regardless, the warning should be disabled on Darwin. Telling users to install KVM makes no sense.

see #148251

@alyssais
Copy link
Member

alyssais commented Dec 2, 2021

Maybe the wrapper can do -accel hvf on Darwin? Regardless, the warning should be disabled on Darwin. Telling users to install KVM makes no sense.

This all sounds good. I'd be happy with a PR for any or all of these things.

@mschwaig
Copy link
Member Author

mschwaig commented Dec 2, 2021

I have opened #148305 which should remove the warning on other platforms, but was not able to test this change on Darwin myself.

I don't know how we would need to adapt the wrapper and warning be useful on for Darwin, but it sounds like a nice idea.

mschwaig added a commit to mschwaig/nixpkgs that referenced this pull request Jan 6, 2022
PR NixOS#125948 introduced a warning when KVM is not available,
which should only be emitted on Linux.
Similarly the -enable-kvm flag itself also never
needs to be added on platforms other than Linux.
@mschwaig mschwaig deleted the fix-check-for-kvm-device branch March 23, 2024 17:53
@mschwaig mschwaig restored the fix-check-for-kvm-device branch March 23, 2024 17:53
@mschwaig mschwaig deleted the fix-check-for-kvm-device branch March 23, 2024 17:56
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/runnixostest-is-faster-when-interactive/69220/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants