cmd: check-host-config rewritten in Go (HMS-9805) by lzap · Pull Request #2043 · osbuild/images

lzap · 2025-11-26T10:04:40Z

Yeah. Cozy winter evening happened.

I wrote the whole framework and implemented two checks and then asked AI to implement the rest. Only checkers in the check package were generated and reviewed by me, all the rest is my own code. Let me know how you like it. I intentionally did not make any changes to the behavior, this is pure refactoring which will open doors to more features like using YAML to find which tests are missing etc.

Also the output is just pure text, but it could be maybe gotest now or something that CI understands and can parse for nicer results. But one step after another, I would rather keep it simple for now.

Now, I really wanted this to be fast so everything runs in parallel, but logs from all goroutines are collected and presented in a nicer way sequentially so the output is not all messed up. I heavily use what is called context function mocking where the platform functions (Exec, Exists, Grep) can be overriden with a mock via special context value. The key type can be not exported making it impossible to mess around with the functions from other packages.

The hard part is now to integrate this in the CICD. We want to build the binaries for each PR and then upload them somewhere and take it from here. For now, I am settling up with simply running "go build" for each test and then copying the binaries over to the image the same way as we did for the script. If we cache GOCACHE maybe this might not be a bad idea at all.

You can safely run this locally, I have reviewed and tested everything:

$ go run ./cmd/check-host-config -config test/configs/all-customizations.json 
[main           ] Loading build config from test/configs/all-customizations.json
[files          ] Checking existence of file: /etc/systemd/system/custom.service
[files          ] Checking if file exists: /etc/systemd/system/custom.service
[files          ] File does not exist: /etc/systemd/system/custom.service
[hostname       ] Executing: hostname
[hostname       ] Command succeeded: hostname
[hostname       ] Comparing 'dev.home.lan' with expected hostname 'my-host'
[cacerts        ] Parsing CA cert 1
[cacerts        ] Extracting serial from CA cert 1: 27894af897dd2423607045716438a725f28a6d0b
[cacerts        ] Extracting CN from CA cert 1: Test CA for osbuild
[cacerts        ] Checking CA cert 1 anchor file serial '27894af897dd2423607045716438a725f28a6d0b'
[cacerts        ] Checking if file exists: /etc/pki/ca-trust/source/anchors/27894af897dd2423607045716438a725f28a6d0b.pem
[cacerts        ] File does not exist: /etc/pki/ca-trust/source/anchors/27894af897dd2423607045716438a725f28a6d0b.pem
[srv-disabled   ] Checking disabled service: bluetooth.service
[srv-disabled   ] Executing: systemctl is-enabled bluetooth.service
[srv-disabled   ] Command succeeded: systemctl
[srv-masked     ] Executing: systemctl list-unit-files --state=masked
[srv-masked     ] Command succeeded: systemctl
[srv-masked     ] Checking masked service: nfs-server
[users          ] Checking user: user1
[users          ] Executing: id user1
[users          ] Command failed: id (exit code: exit status 1)
[srv-enabled    ] Checking enabled service: sshd.service
[srv-enabled    ] Executing: systemctl is-enabled sshd.service
[srv-enabled    ] Command succeeded: systemctl
[srv-enabled    ] Service was enabled service=sshd.service state=enabled
[srv-enabled    ] Checking enabled service: custom.service
[srv-enabled    ] Executing: systemctl is-enabled custom.service
[srv-enabled    ] Command failed: systemctl (exit code: exit status 4)
[fw-srv-disabled] Checking disabled firewall service: telnet
[fw-srv-disabled] Executing: sudo firewall-cmd --query-service=telnet
[fw-srv-disabled] Command failed: sudo (exit code: exit status 1)
[fw-srv-disabled] Firewall service was disabled service=telnet state=no
[fw-srv-enabled ] Checking enabled firewall service: ftp
[fw-srv-enabled ] Executing: sudo firewall-cmd --query-service=ftp
[fw-srv-enabled ] Command failed: sudo (exit code: exit status 1)
[fw-ports       ] Checking enabled firewall port: 1337/udp
[fw-ports       ] Executing: sudo firewall-cmd --query-port=1337/udp
[fw-ports       ] Command failed: sudo (exit code: exit status 1)
Results:
✅ Firewall Services Disabled Check
⚠️  Hostname Check (warn: hostname does not match, got dev.home.lan expected my-host)
❌ Files Check (fail: file does not exist: /etc/systemd/system/custom.service)
❌ CA Certs Check (fail: file missing for cert 1 at /etc/pki/ca-trust/source/anchors/27894af897dd2423607045716438a725f28a6d0b.pem)
❌ Services Disabled Check (fail: service is not disabled: bluetooth.service state: enabled)
❌ Services Masked Check (fail: service is not masked: nfs-server)
❌ Users Check (fail: user does not exist: user1)
❌ Services Enabled Check (fail: service is not enabled: custom.service error: exit status 4)
❌ Firewall Services Enabled Check (fail: firewall service is not enabled: ftp error: exit status 1)
❌ Firewall Ports Check (fail: firewall port is not enabled: 1337:udp error: exit status 1)

The result codes indicate a test which failed as the first one, starting from 10 (1-9 are reserved). These goroutines can fail in random order, so it must not be used for any workarounds it is purely informative.

achilleas-k

I'd like to see this split into multiple commits. The changes outside of cmd/check-host-config/ can be separate from the introduction of the new tool.

The introduction of the tool itself could also be multiple commits, like introducing the main parts and then the individual checks, but I don't feel as strongly about this (one commit for the whole thing isn't too bad).

More generally: I like the check interface. Each little module is nicely defined and clear. But looking at everything going on in main() I have to wonder what it's all for. There seems to be a lot going on for something that should mostly be "call Run() on a list of checks and collect the results". Maybe it could be broken down a bit into functions so its easier to read if it's all necessary.

cmd/check-host-config/main.go

cmd/check-host-config/cos/os.go

achilleas-k · 2025-12-03T16:42:25Z

cmd/check-host-config/cos/os.go

+	if f := ExecFunc(ctx); f != nil {
+		return f(name, arg...)
+	}


It's not very clear to me why this (and the other overrides like it) is needed. Do we use it anywhere, or is it just for mocking?

Only for thread-safe mocking, yes.

See, I value a good mocking absolutely essential in this case where CICD is very expensive, I want to be able to easily create as many reliable tests as I need in order to improve chance there are no bugs before I push and wait 3 hours until I get my results. The context mocking is not very common, but pretty useful approach.

Btw this rewrite alone revealed two issues both in modularity and OpenSCAP when the checks were not entirely correct. So it already paid off.

I added a package.go with further explanation:

// Package mockos provides mockable OS functions for testing. Use this package in // host checks to allow mocking of OS interactions during unit tests. The package // provides WithXXXFunc functions to set mock implementations in a context.Context, // and corresponding XXXFunc functions to retrieve them. If no mock is set, the real // OS functions are used. package mockos

lzap · 2025-12-04T07:38:42Z

The introduction of the tool itself could also be multiple commits, like introducing the main parts and then the individual checks, but I don't feel as strongly about this (one commit for the whole thing isn't too bad).

While generally I agree with isolated commits, here it does not make too much sense as the commit makes sense as a whole. I do not like creating "artificial history" when it was not how the code was really written. Can do this if you insist.

But looking at everything going on in main() I have to wonder what it's all for.

Yeah, I see two confusing parts. Argument parsing is a big unreadable, I was aiming for both because I thought it is good to be backward compatible if someone uses this directly. But now that I think about it, let's break it. I am settling with Go flags because these are shorter to parse than argument + environmental variable.

The special goroutine which collects logs is not necessary, but I really want this to be readable. Because all checks are executed in parallel, the output would be totally unreadable without this. We really, really should do something with unreadable logs on CICD and we have to start somewhere. This check binary will be very often the one that fails and I want a clear output with as much logging as possible. This is why logging is quite elaborated and is being collected and ordered correctly.

Maybe it could be broken down a bit into functions so its easier to read if it's all necessary.

Absolutely, good idea. I moved check slice into checks.go and extracted code into functions.

achilleas-k · 2026-01-09T17:21:15Z

This will make reviews harder

For me at least, a single final commit that changes some config files and modifies all the manifest checksums doesn't affect the reviewing much. As long as it's isolated from the rest of the changes, I can review the PR as if it's not there by viewing the rest of the commits individually (or in ranges, now that GH web UI supports it). So don't worry about that.

achilleas-k

The check always succeeds.

cmd/check-host-config/main.go

achilleas-k · 2026-01-09T18:20:29Z

You mentioned at some point (can't remember where or when) that you had a hard time rebasing and rewriting this because you had to wait for CI jobs that can take hours.

Just ftr, for anyone reading this, it's possible to quickly test it locally by:

Building an image: ./test/scripts/build-image fedora-43 generic-qcow2 ./test/configs/all-customizations.json.
pip install . (in a venv preferably) to make vmtest available.
Running the boot test on the image: ./test/scripts/boot-image fedora-43 generic-qcow2 ./test/configs/all-customizations.json

Step 3 takes about 15 secs to run on my machine, making it very easy to iterate locally on the code.

lzap · 2026-01-10T08:13:54Z

Damn, yeah, nice catch. I am bumping the seed once again then. Rebased.

achilleas-k

Thanks again! Nice work. I went through everything carefully again and have a few nitpicks and minor comments. Approving anyway, since they can be handled in follow-ups.

cmd/check-host-config/main.go

cmd/check-host-config/wait.go

cmd/check-host-config/main.go

achilleas-k · 2026-01-12T15:54:15Z

cmd/check-host-config/wait.go

+// still in the activating state. It calls systemctl list-units to get the list.
+// This is only used in case of timeout to help with debugging.
+func listBadUnits() string {
+	stdout, _, _, err := check.Exec("systemctl", "list-units", "--state=activating,failed", "--plain", "--no-legend", "--no-pager")


systemctl list-units supports --output=json, which would be cleaner here. Though, I don't know what version introduced the feature, so maybe it's not available on RHEL 8.10.

Yeah that is RHEL9+ only.

achilleas-k · 2026-01-12T16:03:04Z

cmd/check-host-config/check/files.go

+	// Note that this test only checks for the existance of the filesystem
+	// customizatons target path not the content. For the simple case when
+	// "data" is provided we could check but for the "uri" case we do not
+	// know the content as the file usually comes from the host.  The
+	// existing testing framework makes the content check difficult, so we
+	// settle for this for now. There is an alternative approach in
+	// https://github.com/osbuild/images/pull/1157/commits/7784f3dc6b435fa03951263e48ea7cfca84c2ebd
+	// that may eventually be considered that is more direct and runs
+	// runs locally but different from the existing paradigm so it
+	// needs further discussion.


We could check permissions and ownership as well. And check data when it's embedded in the blueprint. Let's do it in a follow-up though.

Yes but I aim for no changes in this PR, except fixing obvious bugs. There is much more we can do now pretty easily.

lzap · 2026-01-12T17:42:06Z

Rebased and squashed all your comments, thanks. The only exception being listing unit tests but this is not a big deal it is only in case of error. Any other new functionality I would like to do as a followups.

The check script is rewritten in Go to be more modular and easier to test. No changes have been made to the checks themselves. Comments were carried over from the script.

Switch over to the Go implementation in the boot-image script.

The package "jq" is no longer needed. Drop it from configs. The sssd service is masked on RHEL 8.8+ because it does not start, see the discussion in: https://access.redhat.com/solutions/7017538

lzap · 2026-01-13T10:49:36Z

Only resolved manifest conflicts after auto-merge was enabled, can I get acks again @achilleas-k @supakeen thanks :-)

lzap added the WIP Work in progress. Don't run Gitlab CI. label Nov 26, 2025

lzap force-pushed the check-host-config branch from 78a731d to c58b3e5 Compare November 26, 2025 10:18

lzap removed the WIP Work in progress. Don't run Gitlab CI. label Nov 26, 2025

lzap force-pushed the check-host-config branch 2 times, most recently from 9c447e4 to f385256 Compare November 26, 2025 16:49

lzap changed the title ~~cmd: check-host-config rewritten in Go~~ cmd: check-host-config rewritten in Go (HMS-9805) Nov 26, 2025

lzap force-pushed the check-host-config branch 3 times, most recently from 26f944b to c8c2cae Compare November 27, 2025 09:59

lzap mentioned this pull request Nov 27, 2025

Transfer host check via removable device #2047

Closed

lzap force-pushed the check-host-config branch from c8c2cae to 274a1ba Compare November 27, 2025 11:43

lzap mentioned this pull request Nov 27, 2025

test: enable openscap testing in qemu #2048

Merged

lzap force-pushed the check-host-config branch 11 times, most recently from 70673da to 3b8f8cb Compare December 2, 2025 17:10

lzap mentioned this pull request Dec 3, 2025

Modularity check does not work #2061

Open

lzap force-pushed the check-host-config branch 4 times, most recently from f507ee9 to 0eb7945 Compare December 3, 2025 14:29

achilleas-k reviewed Dec 3, 2025

View reviewed changes

achilleas-k requested changes Jan 9, 2026

View reviewed changes

cmd/check-host-config/main.go Outdated Show resolved Hide resolved

lzap force-pushed the check-host-config branch 4 times, most recently from 1000881 to adcb6fa Compare January 11, 2026 16:34

achilleas-k previously approved these changes Jan 12, 2026

View reviewed changes

lzap dismissed achilleas-k’s stale review via 66a7684 January 12, 2026 17:40

lzap force-pushed the check-host-config branch from adcb6fa to 66a7684 Compare January 12, 2026 17:40

supakeen previously approved these changes Jan 12, 2026

View reviewed changes

supakeen enabled auto-merge January 12, 2026 18:56

lzap added 5 commits January 13, 2026 11:48

gitignore: add build directory and check binary

1d6f797

cmd/check-host-config: rewrite of check-host-config script

eddbaf1

The check script is rewritten in Go to be more modular and easier to test. No changes have been made to the checks themselves. Comments were carried over from the script.

scripts/check-host-config: remove check-host-config

0919d75

scripts/boot-image: update script to use the new implementation

2ba132c

Switch over to the Go implementation in the boot-image script.

config-list: drop jq and add sssd to masked services

0545d0b

The package "jq" is no longer needed. Drop it from configs. The sssd service is masked on RHEL 8.8+ because it does not start, see the discussion in: https://access.redhat.com/solutions/7017538

lzap dismissed supakeen’s stale review via 0545d0b January 13, 2026 10:48

lzap force-pushed the check-host-config branch from 66a7684 to 0545d0b Compare January 13, 2026 10:48

supakeen approved these changes Jan 13, 2026

View reviewed changes

achilleas-k approved these changes Jan 13, 2026

View reviewed changes

This was referenced Jan 13, 2026

Exclude grub and vdo packages on Azure CVM (RHEL 9 and 10) [HMS-9849, HMS-9850] #2130

Merged

Update snapshots to 20260112 #2128

Closed

supakeen added this pull request to the merge queue Jan 13, 2026

Merged via the queue into osbuild:main with commit 27eddcd Jan 13, 2026
25 checks passed

lzap deleted the check-host-config branch January 13, 2026 15:43

Conversation

lzap commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achilleas-k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

achilleas-k Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

lzap Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lzap Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

lzap commented Dec 4, 2025

Uh oh!

achilleas-k commented Jan 9, 2026

Uh oh!

achilleas-k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achilleas-k commented Jan 9, 2026

Uh oh!

lzap commented Jan 10, 2026

Uh oh!

achilleas-k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

achilleas-k Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

lzap Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

achilleas-k Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

lzap Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

lzap commented Jan 12, 2026

Uh oh!

lzap commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lzap commented Nov 26, 2025 •

edited

Loading

lzap Dec 4, 2025 •

edited

Loading