Skip to content

Improve test VM workflow.#2

Merged
plietar merged 3 commits intomainfrom
fetch-vm-secrets
Sep 27, 2024
Merged

Improve test VM workflow.#2
plietar merged 3 commits intomainfrom
fetch-vm-secrets

Conversation

@plietar
Copy link
Contributor

@plietar plietar commented Sep 26, 2024

The VM that is started by nix run .#start-vm now uses GitHub authentication, bringing it closer to the real thing. For this to work it needs a GitHub client ID and secret, which it fetches from Vault.

On the host machine, before starting the VM, a vault token is obtained using the vault login command. The token is passed to the VM as a firmware parameter, allowing it to be used inside the VM to fetch the secrets.

The file layout is tidied up a bit, and the VM tests are improved. These still use basic authentication, for now at least. I've added a GitHub Actions workflow to run the test.

The VM that is started by `nix run .#start-vm` now uses GitHub
authentication, bringing it closer to the real thing. For this to work
it needs a GitHub client ID and secret, which it fetches from Vault.

On the host machine, before starting the VM, a vault token is obtained
using the `vault login` command. The token is passed to the VM as a
firmware parameter, allowing it to be used inside the VM to fetch the
secrets.

The file layout is tidied up a bit, and the VM tests are improved. These
still use basic authentication, for now at least. I've added a GitHub
Actions workflow to run the test.
@plietar plietar requested a review from M-Kusumgar September 26, 2024 10:18
Copy link
Contributor

@M-Kusumgar M-Kusumgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, quick question though, the kipling instance worked however for some reason the priority pathogens instance kept giving me a error fetching auth config, is this expected, i also have left a couple of other questions

Comment on lines +11 to +12
# The `virtualisation.vmVariant` setting we to import VM-specific settings
# doesn't for the test VMs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i cannot decipher this comment XD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow where did all my verbs go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is a bit like vagrant files?

Comment on lines +25 to +28
# It's suprisingly easy to run qemu without hardware acceleration and not
# notice it, which makes the VM so slow the tests tend to fail. This forces
# KVM acceleration and will fail to start if missing.
virtualisation.qemu.options = [ "-machine" "accel=kvm" ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay been reading up about kvm and qemu, i get the general gist but couldnt find exactly how kvm acceleration works, is it because qemu simulates a full machine and kvm hardware acceleration brings that abstraction closer to the metal of the actual machine the vm is running on?

Copy link
Contributor Author

@plietar plietar Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert, but here's my attempt: QEMU is a general virtual machine framework. It has multiple backends, mostly TCG and KVM. Regardless of the backend, it needs to emulate lots of peripherals, eg. the network card and file storage.

TCG is an actual emulator, implemented as a JIT. It reads instructions from the guest machine and translates them into host machine instructions. It also needs to emulate a whole bunch of low level stuff (eg. the memory managment, interrupts, ...). Emulating stuff this way is super slow. Fine if you want to emulate a 90s video game console on a modern fast CPU, less fine if you want to emulate a modern fast CPU on a modern fast CPU.

KVM is the linux API for hardware accelerated VM. It's the Linux equivalent of Hyper-V I guess. It uses whatever the underlying CPU's acceleration is, on intel that's Intel VT. In that mode, guest instructions are run directly on the host CPU. The CPU has special settings to run in VM mode, and these instructions are correctly isolated from the host and each other. The performance of this is close to native (there's usually a small overhead, but not much). It's what everyone does these days (either via Hyper-V or KVM), cloud VMs would just be too slow without hardware acceleration.

Because KVM needs special hardware access, some linux distros, including Ubuntu, restrict the permissions a little bit. Not all though, from my reading it seems some of them have /dev/kvm as 666, so writable by everyone.

I spent way too long figuring out why the tests were slow but nix .#start-vm was fast. Turns out it was because the former runs inside the nix sandbox and doesn't have my permissions. This line makes it so that the tests fail with an error that is obvious, instead of failing because they time out for being way too slow.

nativeBuildInputs = [ perl ];
installPhase = ''
find $GRADLE_USER_HOME/caches/modules-2 -type f -regex '.*\.\(jar\|pom\|module\)' \
| LC_ALL=C sort \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just this line but i have no clue what youre doing here, this is definitely bside the point of this PR but could you leave a comment about this or something just briefly explaining this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah yes, this line is dark magic stolen from occurrences I found in nixpkgs. See for example
https://github.com/NixOS/nixpkgs/blob/8121f3559a98259a8e767dedf4eaf3939442c54d/pkgs/applications/file-managers/mucommander/default.nix#L39-L48

So Nix builds are either sandboxes with no network, or they have network access but need to have an exact hash output. If we wanted to build a package that fetches external dependencies using stuff gradle, cargo, npm, we'd have to set the hash of the output of the build, which is tedious since that would check any time our source changes just a tiny bit, or we change the build process a little.

The compromise in Nix is to split the build process in two, fetch the dependencies out of the sandbox and set a fixed hash, and then run the actual build in the sandbox and not have to set the output hash. This is fine for sensible package managers, but gradle is not one of them. It doesn't really have a proper way of just fetching the dependencies.

What we do here is build packit-api twice. After the first build, we throw away the build output but keep the cache in $GRADLE_USER_HOME/caches/modules-2, and do some preprocessing to make the cache have the same file layout as Maven does. We set a hash for this (sources.gradleDepsHash). Then on the second build we replace the maven references with the output of the first build (it's what the gradleInit below does).

Now the actual sort I added is because the same artifact might appear multiple times in the cache, with slightly different contents, eg. one was published to maven central and one to the gradle plugin repository. Absolute madness but here we are. Thanfully from what I could tell, the differences were very minor.

We can only have a single copy in our fake maven repo. find was acting non deterministically, and the order wasn't stable, and as a result we weren't always copying the same file. I was getting different result on my machine compared to CI. The sort helps keep that deterministic.

LC_ALL=C is slang for "set the locale to naive English". Sorting technically depends on the configure language, so that help keep the result obvious. It probably doesn't matter, since everything is ASCII and I doubt Nix lets the host system locale leak through.

I hate all of this, but it's the best I could find for now. nixpkgs has actually improved this quite a bit in the current master, but we need to wait until November for a stable release: NixOS/nixpkgs#272380

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was actually fucked up

Comment on lines +105 to +107
sudo tee /etc/udev/rules.d/50-nixbld-kvm.rules <<EOF
KERNEL=="kvm", RUN+="/bin/setfacl -m g:nixbld:rw $env{DEVNAME}"
EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the questions, just trying to work out how this is equivalent, the /bin/setfacl -m g:nixbld:rw is fine and then we want /dev/kvm the $env{DEVNAME} gives the /dev bit but then how does the /kvm come into play, youve done KERNEL=="kvm" but couldnt find anything related to kernel on setfacl man page

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a udev rule, which is how modern (last 10-15 years) Linux distros manage devices in /dev. /dev is now a temporary in-memory filesystem, and any changes to it don't persist reboot.

The rule says, for a device that has the label KERNEL=="kvm", run the following command. DEVNAME is set to /dev/kvm. This is more a udev thing than it is a setfacl one.

plietar and others added 2 commits September 26, 2024 16:16
Co-authored-by: M-Kusumgar <98405247+M-Kusumgar@users.noreply.github.com>
Copy link
Contributor

@M-Kusumgar M-Kusumgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for all the explanations!

@plietar plietar merged commit aaa782c into main Sep 27, 2024
@plietar plietar deleted the fetch-vm-secrets branch September 27, 2024 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants