Skip to content

nixos/test-driver: add support for nspawn containers#478109

Merged
Ma27 merged 37 commits intoNixOS:staging-nixosfrom
applicative-systems:nixos-test-containers
Mar 18, 2026
Merged

nixos/test-driver: add support for nspawn containers#478109
Ma27 merged 37 commits intoNixOS:staging-nixosfrom
applicative-systems:nixos-test-containers

Conversation

@kmein
Copy link
Member

@kmein kmein commented Jan 8, 2026

Motivation

Current NixOS integration tests rely heavily on QEMU, which can be slow and resource-intensive. This PR introduces systemd-nspawn as a lightweight container backend, significantly reducing test latency and enabling hardware passthrough scenarios that are difficult to achieve in VMs.

Key advantages of container tests

  • Faster boot times and lower overhead
    ~25% improvement in test execution speed on Intel(R) Core(TM) Ultra 9 285HXbenchmark of 24 machines running GNU hello

    nix-build test-backends-benchmark.nix -A hello-nspawn 27.76s user 1.50s system 155% cpu 18.803 total
    nix-build test-backends-benchmark.nix -A hello-qemu 36.80s user 1.92s system 140% cpu 27.548 total
  • Container tests can be run in cheap VMs instead of bare-metal machines.
  • Containers allow direct bind-mounting of host device nodes. This enables integration testing for CUDA code within the NixOS test framework.
  • The implementation (see below) lays the groundwork for other machine backends (other container infrastructures, bare-metal etc.)

Try it out (2 simple steps!)

  1. Configure the nix daemon to allow running systemd-nspawn:
 nix.settings.auto-allocate-uids = true;
 nix.settings.experimental-features = ["auto-allocate-uids" "cgroups"];
 nix.settings.extra-system-features = ["uid-range"];
 nix.settings.sandbox-paths = [ "/dev/net" ]; # to make nspawn↔qemu networking work
  1. Run a container test, either
  • nix-build -A nixosTests.nixos-test-driver.containers: Basic startup and inter-VLAN isolation (VMs and containers in parallel).
  • nix-build -A nixosTests.test-containers-bittorrent: Complex networking (NAT/UPnP) involving multiple containers.

Implementation details

  • Refactor the Python test-driver to move QEMU-specific logic into a QemuMachine class (inheriting from an abstract BaseMachine class).
  • Introduce an NspawnMachine class replicates as much functionality of QemuMachine as possible using systemd-nspawn containers.
  • Use the existing QEMU networking options (virtualisation.vlans) for containers.
  • Bridge QEMU's VLANs to the containers, enabling VM↔container networking.

Debugging

To debug a failing container test, introduce enableDebugHook and sshBackdoor like so:

# nixos/tests/test-containers-backdoor.nix
{
  name = "containers-backdoor";

  containers.machine = { };

  sshBackdoor.enable = true;
  enableDebugHook = true;

  testScript = ''
    start_all()
    machine.succeed("false") # this will fail
  '';
}

The test will then print an ssh command on startup:

machine:  ssh -o User=root -o ProxyCommand="socat - UNIX-CLIENT:/run/systemd/nspawn/unix-export/machine/ssh" bash

Upon failure, the test will print a command to attach to it, e.g.

!!! Breakpoint reached, run 'sudo /nix/store/hb1v3cz5bd6qk8arhxy6bii0wcilg8wh-attach/bin/attach 6793584

Run this to get a shell inside the sandbox, where you can run the ssh command above to enter the container.

Note: Due to the nature of systemd-nspawn, interactive execution of the tests requires root privileges: sudo $(nix -L build --print-out-paths .#nixosTests.test-containers.driverInteractive)/bin/nixos-test-driver --interactive

Limitations of containers

  • You cannot test kernel-specific changes (e. g. kernel modules) without also having them active on the host.
  • Containers running in the Nix sandbox cannot run setuid wrappers (like sudo—though you can use runuser instead).
  • Container tests do not support graphical applications (and taking screenshots of them).
  • Containers running in the Nix sandbox don't allow many of the systemd hardening options (ProtectSystem= etc.) used by NixOS modules such as services.transmission among many others.
  • Containers running in the sandbox have limited access to /dev, making it necessary to pass in needed paths, e. g. --option sandbox-paths /dev/net for VPN tests that create /dev/net/tun.

Credits and history

Based on the testing infrastructure from @clan-lol.
The heavy lifting for integrating this into nixpkgs was done by @jfly.

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

Copilot AI review requested due to automatic review settings January 8, 2026 15:41
@kmein kmein force-pushed the nixos-test-containers branch from 1a58317 to 81245e7 Compare January 8, 2026 15:45
@kmein kmein changed the base branch from master to staging January 8, 2026 15:49
@nixpkgs-ci nixpkgs-ci bot closed this Jan 8, 2026
@nixpkgs-ci nixpkgs-ci bot reopened this Jan 8, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces support for systemd-nspawn containers as a lightweight alternative to QEMU VMs in the NixOS test framework. The implementation refactors the Python test driver to use an abstract BaseMachine class with separate QemuMachine and NspawnMachine implementations, enabling tests to run both VMs and containers in parallel with shared networking infrastructure.

Key changes:

  • Refactored test driver from monolithic Machine class to abstract BaseMachine with specialized QemuMachine and NspawnMachine subclasses
  • Created new guest-networking-options.nix module to share VLAN configuration between QEMU VMs and nspawn containers
  • Added run-nspawn Python package for managing container lifecycle, networking, and process execution via nsenter

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
nixos/tests/test-containers.nix New test demonstrating basic container startup and VLAN isolation
nixos/tests/test-containers-bittorrent.nix Complex networking test with NAT/UPnP using multiple containers
nixos/tests/all-tests.nix Registers new container tests
nixos/modules/virtualisation/qemu-vm.nix Extracts networking options to shared module
nixos/modules/virtualisation/guest-networking-options.nix New shared networking configuration for VMs and containers
nixos/modules/virtualisation/nspawn-container/default.nix Container profile module with systemd-nspawn configuration
nixos/modules/virtualisation/nspawn-container/run-nspawn/ Python package for container lifecycle management
nixos/modules/testing/test-instrumentation.nix Disables backdoor for containers (not compatible)
nixos/lib/testing/nodes.nix Adds container support alongside nodes with separate defaults
nixos/lib/testing/network.nix Refactors networking to support both VMs and containers
nixos/lib/testing/driver.nix Updates driver build to pass container scripts separately
nixos/lib/testing/run.nix Adds uid-range requirement for container tests
nixos/lib/testing/testScript.nix Minor variable rename for clarity
nixos/lib/testing/nixos-test-base.nix Removes qemu-vm import (now conditional)
nixos/lib/test-driver/src/test_driver/machine/init.py Major refactoring: BaseMachine, QemuMachine, NspawnMachine classes
nixos/lib/test-driver/src/test_driver/driver.py Updates driver to handle both VM and container machines
nixos/lib/test-driver/src/test_driver/init.py Adds CLI arguments for container support
nixos/lib/test-driver/default.nix Adds systemd and util-linux dependencies
nixos/lib/test-script-prepend.py Updates type hints for new machine classes
nixos/lib/testing-python.nix Adds containers parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nixpkgs-ci nixpkgs-ci bot requested review from RaitoBezarius and tfc January 8, 2026 15:54
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-nixos-tests This PR causes rebuilds for all NixOS tests and should normally target the staging branches. 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: testing Tooling for automated testing of packages and modules labels Jan 8, 2026
@tfc tfc requested a review from Ma27 January 8, 2026 16:53
@tfc
Copy link
Contributor

tfc commented Jan 12, 2026

@ofborg test nat.firewall networking.scripted.link installer.simpleProvided installer.separateBootFat networking.scripted.virtual printing keymap.azerty keymap.dvorak-programmer boot-stage1 installer.swraid keymap.neo nfs4.simple i3wm udisks2 networking.networkd.loopback containers-ip ecryptfs login installer.simpleLabels php.httpd zfs.installer predictable-interface-names.unpredictable predictable-interface-names.unpredictableNetworkd mutableUsers

@kmein kmein force-pushed the nixos-test-containers branch 4 times, most recently from 227fa66 to ccf61f9 Compare January 13, 2026 15:17
@kmein kmein force-pushed the nixos-test-containers branch from ccf61f9 to 0722a1f Compare January 14, 2026 08:35
@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Jan 17, 2026
Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly low-hanging fruits. Hopefully I'll have spoons to do the rest tomorrow or next week :)

# (n-daemon)[417]: transmission.service: Failed to create destination mount point node '/run/transmission/run/host/.os-release-stage/', ignoring: Read-only file system
# (n-daemon)[417]: transmission.service: Failed to mount /run/systemd/propagate/.os-release-stage to /run/transmission/run/host/.os-release-stage/: No such file or directory
# (n-daemon)[417]: transmission.service: Failed to set up mount namespacing: /run/host/.os-release-stage/: No such file or directory
# (n-daemon)[417]: transmission.service: Failed at step NAMESPACE spawning /nix/store/zfksw9bllp95pl45d1nxmpd2lks42bkj-transmission-4.0.6/bin/transmission-daemon: No such file or directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you traced down what settings lead to that problem?
At this point I'm unsure if I consider this a potential bug in the driver, something that needs to be fixed or something that we can just accept (if so, I think it's worth leaving a more detailed rationale here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting specifically:

Disabling it manually lets transmission start successfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one of our systemd folks has an opinion on that? I'm not sure if we're missing something here or if just turning the option for that test-case off is OK cc @ElvishJerricco @nikstur

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand you right in that you would prefer a test that uses the upstream services.transmission module (with the option turned off if necessary) to a test that uses aria2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about the service itself, I'm just wondering if we're holding something wrong here. And finally, I wouldn't want to lift arbitrary hardening once we get to the point that this feature of the test framework is being used inside nixpkgs.

maybe cc @NixOS/systemd reaches more people who could weigh in.

@kmein kmein force-pushed the nixos-test-containers branch 2 times, most recently from 9f082bb to abd52f4 Compare January 19, 2026 08:45
github-actions[bot]

This comment was marked as outdated.

@kmein kmein force-pushed the nixos-test-containers branch 2 times, most recently from 80b4cb2 to e02998f Compare January 19, 2026 08:52
visible = "shallow";
description = ''
An attribute set of NixOS configuration modules.
An attribute set of NixOS configuration modules representing QEMU vms that can be started during a test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't something that needs to happen in this PR but perhaps worth thinking about early:

Once the early kinks of the container backend are ironed out, we probably want to have containers become the default backend. For that we'd need to explicitly declare the tests which rely on hardware virtualisation as such.

I'd prefer if "nodes" was kept as an abstract option to declare multiple NixOS systems without needing to declare the technical detail of how they're implemented. I'd also prefer if the majority of nixosTests that have no requirement for this technical detail didn't have to explicitly declare it either and would keep using "whatever is the default"; with merely the meaning of the default being changed in the future.

For that we'd need the distinction between "nixos systems in virtual machines" and "nixos systems". What do you think of:

  1. Adding an option specifically for declaring VMs; mirroring the containers one
  2. (Later) Migrating all tests that require VM-level virtualisation to use said option to explicitly declare that fact
  3. (Later still) Switching the default implementation for nodes to containers once they're production-ready and sufficient time has passed to allow downstream consumers to migrate to the VM option where needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Atemu, this was our exact thought! It seems like this method will gain broader consensus easily. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree here, yes. I especially cannot come up with an alternate generic name for nodes that's reasonably good and would save the migration effort.

But this PR is a beast already (not a surprise given what had to be done!), so I have a strong opinion on doing this in a follow-up to get this one out sooner than later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Atemu for sketching out this actionable roadmap. This is indeed what we had in mind; the amount of CPU time we'd save with a more lightweight default almost argues for itself, so of course we're going to follow up on this first step.

However, I am with @Ma27 in that this should wait until this PR and #479968 (the docs) have been merged.

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the two things missing that I'm seeing is @KiaraGrouwstra's note on containers missing as _module.arg and a comment needing an update. Rest is good from my view, hence I'm approving.

Thanks a lot for the work and bearing with me as reviewer!

@kmein while you're at it, you may want to consider retargeting this to staging-nixos since this PR would be qualified to go through that branch instead (this is way faster than having this go through all of staging).
cc @K900 @zowoq as a heads-up since it's usually you merging this into master.

};
baseQemuOS = baseOS.extendModules {
modules = [
../../modules/virtualisation/qemu-vm.nix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import only applies to QEMU. It's correct that containers are ignoring virtualisation.memorySize, but the option doesn't exist there (unless you'd explicitly defined it, of course):

       error: The option `containers.c1.virtualisation.memorySize' does not exist. Definition values:
       - In `/home/ma27/Projects/nixpkgs-hack/nixos/tests/nixos-test-driver/containers.nix': 1024

visible = "shallow";
description = ''
An attribute set of NixOS configuration modules.
An attribute set of NixOS configuration modules representing QEMU vms that can be started during a test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree here, yes. I especially cannot come up with an alternate generic name for nodes that's reasonably good and would save the migration effort.

But this PR is a beast already (not a surprise given what had to be done!), so I have a strong opinion on doing this in a follow-up to get this one out sooner than later.

@nixpkgs-ci nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Mar 15, 2026
@kmein kmein force-pushed the nixos-test-containers branch from 2f28ffa to d589103 Compare March 16, 2026 12:36
Resolves an issue where nodes on shared secondary VLANs could not reach
each other if their primary IPs were on isolated networks.
@kmein kmein force-pushed the nixos-test-containers branch from d589103 to 6631316 Compare March 16, 2026 12:56
@kmein kmein changed the base branch from staging to staging-nixos March 16, 2026 12:56
@nixpkgs-ci nixpkgs-ci bot closed this Mar 16, 2026
@nixpkgs-ci nixpkgs-ci bot reopened this Mar 16, 2026
@kmein kmein force-pushed the nixos-test-containers branch from d19dde7 to c22da90 Compare March 16, 2026 15:35
@kmein
Copy link
Member Author

kmein commented Mar 16, 2026

@Ma27 I've checked the final items off the list and retargeted to staging-nixos.
Thank you for the approval, and a huge thanks for your patient, thorough, and thoughtful guidance throughout this entire process! I've learned a lot along the way. ❤️

Also, thank you @jfly for the head start that brought the finish line into sight—and for your continued input here as well.

@kmein kmein force-pushed the nixos-test-containers branch 2 times, most recently from 4487447 to 324dec8 Compare March 16, 2026 16:59
# note that this is a hacky solution to the fact that
# the container's "real" /run is clobbered by a tmpfs (see below)
Path("/host/run").mkdir(parents=True)
subprocess.run(["mount", "--bind", "/run", "/host/run"], check=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that this got pushed after fixing up discussions and being at a point where this PR is good to go, is this intended to be on this branch?

Can't this be put into some directory in the container's /run, I don't see how a tmpfs moutned in there is keeping us from doing that.
Also, does /run/opengl-driver even exist in a sandbox? If this is only thrown in via sandbox-paths this strongly looks like a hack for a hack. I'd really prefer to do this in a follow-up and submit the chunk that's good to go already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a fair point—I agree that it might feel like a hack. I've gone ahead and removed that commit from this PR so we can keep this focused on the verified changes. I'll look into a more proper implementation in a separate follow-up.

This should be good to go now!

Copy link
Member

@Eveeifyeve Eveeifyeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this pr getting merged, with the follow-up pr with the stuff mentioned from @Mic92 on the horizon.

@nixpkgs-ci nixpkgs-ci bot added 12.approvals: 2 This PR was reviewed and approved by two persons. and removed 12.approvals: 1 This PR was reviewed and approved by one person. labels Mar 16, 2026
@nixpkgs-ci nixpkgs-ci bot removed the 12.approvals: 2 This PR was reviewed and approved by two persons. label Mar 17, 2026
Co-authored-by: cinereal <cinereal@riseup.net>
@KiaraGrouwstra
Copy link
Contributor

CI reports job canceled?

@vcunat
Copy link
Member

vcunat commented Mar 19, 2026

Bisection is telling me that 23f1e63 broke manual build, e.g. nix build -f nixos/release.nix manual.x86_64-linux
https://github.com/NixOS/nixpkgs/actions/runs/23282868559/job/67699910130?pr=497493

@kmein
Copy link
Member Author

kmein commented Mar 19, 2026

@vcunat Terribly sorry for that! Neither me nor the CI caught it. Didn't build the manuals because my commits didn't touch the manuals. There is a corresponding docs PR in the works at #479968 which should fix the failures. I'll rebase it ASAP.

@vcunat
Copy link
Member

vcunat commented Mar 19, 2026

Another channel blocker. 799cafc broke nix build -f. nixosTests.allDrivers.firefox Hydra log: https://hydra.nixos.org/build/324392409/nixlog/1

@vcunat
Copy link
Member

vcunat commented Mar 20, 2026

Ah, it is not a channel blocker. I forgot that we don't block on Firefox tests (anymore) but only on the builds.

@trofi
Copy link
Contributor

trofi commented Mar 20, 2026

Bisect says 23f1e63 broke eval of at least snipe-it.tests as:

$ nix-instantiate -A snipe-it.tests
error:
       … while evaluating the attribute 'drvPath'
         at lib/customisation.nix:445:7:
          444|     // {
          445|       drvPath =
             |       ^
          446|         assert condition;

       … while calling the 'derivationStrict' builtin
         at «nix-internal»/derivation-internal.nix:37:12:
           36|
           37|   strict = derivationStrict drvAttrs;
             |            ^
           38|

       … while evaluating the option `testScriptString':

       … while evaluating definitions from `nixos/lib/testing/testScript.nix':

       … while evaluating definitions from `nixos/tests/web-apps/snipe-it.nix':

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: function 'testScript' called with unexpected argument 'containers'
       at nixos/tests/web-apps/snipe-it.nix:47:5:
           46|   testScript =
           47|     { nodes }:
             |     ^
           48|     let

@Mic92
Copy link
Member

Mic92 commented Mar 20, 2026

@trofi @vcunat #501599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: testing Tooling for automated testing of packages and modules 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-nixos-tests This PR causes rebuilds for all NixOS tests and should normally target the staging branches. 12.approvals: 3+ This PR was reviewed and approved by three or more persons.

Projects

None yet

Development

Successfully merging this pull request may close these issues.