Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -77,52 +77,34 @@ An SSH-based backdoor to log into machines can be enabled with
}
```

::: {.warning}
Make sure to only enable the backdoor for interactive tests
(i.e. by using `interactive.sshBackdoor.enable`)! This is the only
supported configuration.

Running a test in a sandbox with this will fail because `/dev/vhost-vsock` isn't available
in the sandbox.
:::

This creates a [vsock socket](https://man7.org/linux/man-pages/man7/vsock.7.html)
for each VM to log in with SSH. This configures root login with an empty password.

When the VMs get started interactively with the test-driver, it's possible to
connect to `machine` with
On the host-side a UNIX domain-socket is used with
[vhost-device-vsock](https://github.com/rust-vmm/vhost-device/blob/main/vhost-device-vsock/README.md).
That way, it's not necessary to assign system-wide unique vsock numbers.

```
$ ssh vsock/3 -o User=root
$ ssh vsock-mux//tmp/path/to/host -o User=root
```

The socket numbers correspond to the node number of the test VM, but start
at three instead of one because that's the lowest possible
vsock number. The exact SSH commands are also printed out when starting
`nixos-test-driver`.

On non-NixOS systems you'll probably need to enable
the SSH config from {manpage}`systemd-ssh-proxy(1)` yourself.

If starting VM fails with an error like
The socket paths are printed when starting the test driver:

```
qemu-system-x86_64: -device vhost-vsock-pci,guest-cid=3: vhost-vsock: unable to set guest cid: Address already in use
Note: this requires systemd-ssh-proxy(1) to be enabled (default on NixOS 25.05 and newer).
machine: ssh -o User=root vsock-mux//tmp/tmpg1rp9nti/machine_host.socket
```

it means that the vsock numbers for the VMs are already in use. This can happen
if another interactive test with SSH backdoor enabled is running on the machine.
On non-NixOS systems you'll probably need to enable
the SSH config from {manpage}`systemd-ssh-proxy(1)` yourself.

In that case, you need to assign another range of vsock numbers. You can pick another
offset with
During a test-run, it's possible to print the SSH commands again by running

```nix
{
sshBackdoor = {
enable = true;
vsockOffset = 23542;
};
}
```
In [2]: dump_machine_ssh()
SSH backdoor enabled, the machines can be accessed like this:
Note: this requires systemd-ssh-proxy(1) to be enabled (default on NixOS 25.05 and newer).
machine: ssh -o User=root vsock-mux//tmp/tmpg1rp9nti/machine_host.socket
```

## Port forwarding to NixOS test VMs {#sec-nixos-test-port-forwarding}
Expand Down
14 changes: 3 additions & 11 deletions nixos/doc/manual/development/writing-nixos-tests.section.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,24 +376,16 @@ This can be done when something in the test fails, e.g.
}
```

For the AF_VSOCK feature to work, `/dev/vhost-vsock` is needed in the sandbox
which can be done with e.g.

```
nix-build -A nixosTests.foo --option sandbox-paths /dev/vhost-vsock
```

This will halt the test execution on a test-failure and print instructions
on how to enter the sandbox shell of the VM test. Inside, one can log into
e.g. `machine` with

```
ssh -F ./ssh_config vsock/3
ssh -F ./ssh_config -o User=root vsock-mux//tmp/.../machine_host.socket
```

As described in [](#sec-nixos-test-ssh-access), the numbers for vsock start at
`3` instead of `1`. So the first VM in the network (sorted alphabetically) can
be accessed with `vsock/3`.
The socket paths are printed at the beginning of the test. See
[](#sec-nixos-test-ssh-access) for more context.

Alternatively, it's possible to explicitly set a breakpoint with
`debug.breakpoint()`. This also has the benefit, that one can step through
Expand Down
3 changes: 0 additions & 3 deletions nixos/doc/manual/redirects.json
Original file line number Diff line number Diff line change
Expand Up @@ -1984,9 +1984,6 @@
"test-opt-sshBackdoor.enable": [
"index.html#test-opt-sshBackdoor.enable"
],
"test-opt-sshBackdoor.vsockOffset": [
"index.html#test-opt-sshBackdoor.vsockOffset"
],
"test-opt-enableDebugHook": [
"index.html#test-opt-enableDebugHook"
],
Expand Down
2 changes: 2 additions & 0 deletions nixos/lib/test-driver/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
ruff,
tesseract4,
vde2,
vhost-device-vsock,
extraPythonPackages ? (_: [ ]),
nixosTests,
}:
Expand Down Expand Up @@ -43,6 +44,7 @@ python3Packages.buildPythonApplication {
qemu_pkg
socat
vde2
vhost-device-vsock
]
++ lib.optionals enableOCR [
imagemagick_light
Expand Down
9 changes: 5 additions & 4 deletions nixos/lib/test-driver/src/test_driver/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,9 @@ def main() -> None:
type=Path,
)
arg_parser.add_argument(
"--dump-vsocks",
"--enable-ssh-backdoor",
help="indicates that the interactive SSH backdoor is active and dumps information about it on start",
type=int,
action="store_true",
)

args = arg_parser.parse_args()
Expand Down Expand Up @@ -147,9 +147,10 @@ def main() -> None:
args.keep_vm_state,
args.global_timeout,
debug=debugger,
enable_ssh_backdoor=args.enable_ssh_backdoor,
) as driver:
if offset := args.dump_vsocks:
driver.dump_machine_ssh(offset)
if args.enable_ssh_backdoor:
driver.dump_machine_ssh()
if args.interactive:
history_dir = os.getcwd()
history_path = os.path.join(history_dir, ".nixos-test-history")
Expand Down
124 changes: 101 additions & 23 deletions nixos/lib/test-driver/src/test_driver/driver.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import os
import re
import signal
import subprocess
import sys
import tempfile
import threading
import traceback
from collections.abc import Callable, Iterator
from contextlib import AbstractContextManager, contextmanager
from dataclasses import dataclass
from pathlib import Path
from typing import Any
from unittest import TestCase
Expand Down Expand Up @@ -57,18 +59,62 @@ def pythonize_name(name: str) -> str:
return re.sub(r"^[^A-Za-z_]|[^A-Za-z0-9_]", "_", name)


@dataclass
class VsockPair:
guest: Path
host: Path
cid: int


class VHostDeviceVsock:
def __init__(self, tmp_dir: Path, machines: Iterator[str]):
self.temp_dir_handle = tempfile.TemporaryDirectory(dir=tmp_dir)
self.temp_dir = Path(self.temp_dir_handle.name)
self.sockets = {
machine: VsockPair(
self.temp_dir / f"{machine}_guest.socket",
self.temp_dir / f"{machine}_host.socket",
cid,
)
for cid, machine in enumerate(machines, start=3)
}

self.vhost_proc = subprocess.Popen(
(
"vhost-device-vsock",
*(
arg
for vsock_pair in self.sockets.values()
for arg in (
"--vm",
f"guest-cid={vsock_pair.cid},socket={vsock_pair.guest},uds-path={vsock_pair.host}",
)
),
)
)

def __del__(self) -> None:
self.vhost_proc.kill()
self.temp_dir_handle.cleanup()


class Driver:
"""A handle to the driver that sets up the environment
and runs the tests"""

tests: str
vlans: list[VLan]
machines: list[Machine]
vlans: list[VLan] = []
machines: list[Machine] = []
polling_conditions: list[PollingCondition]
global_timeout: int
race_timer: threading.Timer
start_scripts: list[NixStartScript]
vlan_ids: list[int]
keep_vm_state: bool
logger: AbstractLogger
debug: DebugAbstract
vhost_vsock: VHostDeviceVsock | None = None
enable_ssh_backdoor: bool

def __init__(
self,
Expand All @@ -80,40 +126,51 @@ def __init__(
keep_vm_state: bool = False,
global_timeout: int = 24 * 60 * 60 * 7,
debug: DebugAbstract = DebugNop(),
enable_ssh_backdoor: bool = False,
):
self.tests = tests
self.out_dir = out_dir
self.global_timeout = global_timeout
self.race_timer = threading.Timer(global_timeout, self.terminate_test)
self.logger = logger
self.debug = debug
self.vlan_ids = list(set(vlans))
self.polling_conditions = []
self.keep_vm_state = keep_vm_state
self.global_timeout = global_timeout
self.start_scripts = list(map(NixStartScript, start_scripts))
self.enable_ssh_backdoor = enable_ssh_backdoor

def __enter__(self) -> "Driver":
self.race_timer = threading.Timer(self.global_timeout, self.terminate_test)
tmp_dir = get_tmp_dir()

with self.logger.nested("start all VLans"):
vlans = list(set(vlans))
self.vlans = [VLan(nr, tmp_dir, self.logger) for nr in vlans]

def cmd(scripts: list[str]) -> Iterator[NixStartScript]:
for s in scripts:
yield NixStartScript(s)
self.vlans = [VLan(nr, tmp_dir, self.logger) for nr in self.vlan_ids]

self.polling_conditions = []
if self.enable_ssh_backdoor:
with self.logger.nested("start vhost-device-vsock"):
self.vhost_vsock = VHostDeviceVsock(
tmp_dir, (cmd.machine_name for cmd in self.start_scripts)
)

self.machines = [
Machine(
start_command=cmd,
keep_vm_state=keep_vm_state,
keep_vm_state=self.keep_vm_state,
name=cmd.machine_name,
tmp_dir=tmp_dir,
callbacks=[self.check_polling_conditions],
out_dir=self.out_dir,
logger=self.logger,
vsock_guest=(
self.vhost_vsock.sockets[cmd.machine_name].guest
if self.vhost_vsock is not None
else None
),
)
for cmd in cmd(start_scripts)
for cmd in self.start_scripts
]

def __enter__(self) -> "Driver":
return self

def __exit__(self, *_: Any) -> None:
Expand All @@ -131,6 +188,14 @@ def __exit__(self, *_: Any) -> None:
except Exception as e:
self.logger.error(f"Error during cleanup of vlan{vlan.nr}: {e}")

if self.enable_ssh_backdoor:
try:
del self.vhost_vsock
except Exception as e:
self.logger.error(
f"Error during cleanup of vhost-device-vsock process: {e}"
)

def subtest(self, name: str) -> Iterator[None]:
"""Group logs under a given test name"""
with self.logger.subtest(name):
Expand Down Expand Up @@ -164,6 +229,7 @@ def subtest(name: str) -> Iterator[None]:
Machine=Machine, # for typing
t=AssertionTester(),
debug=self.debug,
dump_machine_ssh=self.dump_machine_ssh,
)
machine_symbols = {pythonize_name(m.name): m for m in self.machines}
# If there's exactly one machine, make it available under the name
Expand All @@ -183,18 +249,26 @@ def subtest(name: str) -> Iterator[None]:
)
return {**general_symbols, **machine_symbols, **vlan_symbols}

def dump_machine_ssh(self, offset: int) -> None:
print("SSH backdoor enabled, the machines can be accessed like this:")
print(
f"{Style.BRIGHT}Note:{Style.RESET_ALL} this requires {Style.BRIGHT}systemd-ssh-proxy(1){Style.RESET_ALL} to be enabled (default on NixOS 25.05 and newer)."
)
names = [machine.name for machine in self.machines]
longest_name = len(max(names, key=len))
for num, name in enumerate(names, start=offset + 1):
spaces = " " * (longest_name - len(name) + 2)
def dump_machine_ssh(self) -> None:
if not self.enable_ssh_backdoor:
return

assert self.vhost_vsock is not None

if self.machines:
print("SSH backdoor enabled, the machines can be accessed like this:")
print(
f" {name}:{spaces}{Style.BRIGHT}ssh -o User=root vsock/{num}{Style.RESET_ALL}"
f"{Style.BRIGHT}Note:{Style.RESET_ALL} this requires {Style.BRIGHT}systemd-ssh-proxy(1){Style.RESET_ALL} to be enabled (default on NixOS 25.05 and newer)."
)
names = [machine.name for machine in self.machines]
longest_name = len(max(names, key=len))
for name in names:
spaces = " " * (longest_name - len(name) + 2)
print(
f" {name}:{spaces}{Style.BRIGHT}ssh -o User=root vsock-mux/{self.vhost_vsock.sockets[name].host}{Style.RESET_ALL}"
)
else:
print("SSH backdoor enabled, but no machines defined")

def test_script(self) -> None:
"""Run the test script"""
Expand Down Expand Up @@ -285,6 +359,10 @@ def create_machine(

cmd = NixStartScript(start_command)
name = name or cmd.machine_name
if self.enable_ssh_backdoor:
self.logger.warning(
f"create_machine({name}): not enabling SSH backdoor, this is not supported for VMs created with create_machine!"
)

return Machine(
tmp_dir=tmp_dir,
Expand Down
Loading
Loading