cmd: check-host-config rewritten in Go (HMS-9805)#2043
cmd: check-host-config rewritten in Go (HMS-9805)#2043supakeen merged 5 commits intoosbuild:mainfrom
Conversation
78a731d to
c58b3e5
Compare
9c447e4 to
f385256
Compare
26f944b to
c8c2cae
Compare
c8c2cae to
274a1ba
Compare
70673da to
3b8f8cb
Compare
f507ee9 to
0eb7945
Compare
achilleas-k
left a comment
There was a problem hiding this comment.
I'd like to see this split into multiple commits. The changes outside of cmd/check-host-config/ can be separate from the introduction of the new tool.
The introduction of the tool itself could also be multiple commits, like introducing the main parts and then the individual checks, but I don't feel as strongly about this (one commit for the whole thing isn't too bad).
More generally: I like the check interface. Each little module is nicely defined and clear. But looking at everything going on in main() I have to wonder what it's all for. There seems to be a lot going on for something that should mostly be "call Run() on a list of checks and collect the results". Maybe it could be broken down a bit into functions so its easier to read if it's all necessary.
cmd/check-host-config/cos/os.go
Outdated
| if f := ExecFunc(ctx); f != nil { | ||
| return f(name, arg...) | ||
| } |
There was a problem hiding this comment.
It's not very clear to me why this (and the other overrides like it) is needed. Do we use it anywhere, or is it just for mocking?
There was a problem hiding this comment.
Only for thread-safe mocking, yes.
See, I value a good mocking absolutely essential in this case where CICD is very expensive, I want to be able to easily create as many reliable tests as I need in order to improve chance there are no bugs before I push and wait 3 hours until I get my results. The context mocking is not very common, but pretty useful approach.
Btw this rewrite alone revealed two issues both in modularity and OpenSCAP when the checks were not entirely correct. So it already paid off.
There was a problem hiding this comment.
I added a package.go with further explanation:
// Package mockos provides mockable OS functions for testing. Use this package in
// host checks to allow mocking of OS interactions during unit tests. The package
// provides WithXXXFunc functions to set mock implementations in a context.Context,
// and corresponding XXXFunc functions to retrieve them. If no mock is set, the real
// OS functions are used.
package mockos
While generally I agree with isolated commits, here it does not make too much sense as the commit makes sense as a whole. I do not like creating "artificial history" when it was not how the code was really written. Can do this if you insist.
Yeah, I see two confusing parts. Argument parsing is a big unreadable, I was aiming for both because I thought it is good to be backward compatible if someone uses this directly. But now that I think about it, let's break it. I am settling with Go flags because these are shorter to parse than argument + environmental variable. The special goroutine which collects logs is not necessary, but I really want this to be readable. Because all checks are executed in parallel, the output would be totally unreadable without this. We really, really should do something with unreadable logs on CICD and we have to start somewhere. This check binary will be very often the one that fails and I want a clear output with as much logging as possible. This is why logging is quite elaborated and is being collected and ordered correctly.
Absolutely, good idea. I moved check slice into |
For me at least, a single final commit that changes some config files and modifies all the manifest checksums doesn't affect the reviewing much. As long as it's isolated from the rest of the changes, I can review the PR as if it's not there by viewing the rest of the commits individually (or in ranges, now that GH web UI supports it). So don't worry about that. |
achilleas-k
left a comment
There was a problem hiding this comment.
The check always succeeds.
|
You mentioned at some point (can't remember where or when) that you had a hard time rebasing and rewriting this because you had to wait for CI jobs that can take hours. Just ftr, for anyone reading this, it's possible to quickly test it locally by:
Step 3 takes about 15 secs to run on my machine, making it very easy to iterate locally on the code. |
|
Damn, yeah, nice catch. I am bumping the seed once again then. Rebased. |
1000881 to
adcb6fa
Compare
achilleas-k
left a comment
There was a problem hiding this comment.
Thanks again! Nice work. I went through everything carefully again and have a few nitpicks and minor comments. Approving anyway, since they can be handled in follow-ups.
| // still in the activating state. It calls systemctl list-units to get the list. | ||
| // This is only used in case of timeout to help with debugging. | ||
| func listBadUnits() string { | ||
| stdout, _, _, err := check.Exec("systemctl", "list-units", "--state=activating,failed", "--plain", "--no-legend", "--no-pager") |
There was a problem hiding this comment.
systemctl list-units supports --output=json, which would be cleaner here. Though, I don't know what version introduced the feature, so maybe it's not available on RHEL 8.10.
There was a problem hiding this comment.
Yeah that is RHEL9+ only.
| // Note that this test only checks for the existance of the filesystem | ||
| // customizatons target path not the content. For the simple case when | ||
| // "data" is provided we could check but for the "uri" case we do not | ||
| // know the content as the file usually comes from the host. The | ||
| // existing testing framework makes the content check difficult, so we | ||
| // settle for this for now. There is an alternative approach in | ||
| // https://github.com/osbuild/images/pull/1157/commits/7784f3dc6b435fa03951263e48ea7cfca84c2ebd | ||
| // that may eventually be considered that is more direct and runs | ||
| // runs locally but different from the existing paradigm so it | ||
| // needs further discussion. |
There was a problem hiding this comment.
We could check permissions and ownership as well. And check data when it's embedded in the blueprint. Let's do it in a follow-up though.
There was a problem hiding this comment.
Yes but I aim for no changes in this PR, except fixing obvious bugs. There is much more we can do now pretty easily.
adcb6fa to
66a7684
Compare
|
Rebased and squashed all your comments, thanks. The only exception being listing unit tests but this is not a big deal it is only in case of error. Any other new functionality I would like to do as a followups. |
The check script is rewritten in Go to be more modular and easier to test. No changes have been made to the checks themselves. Comments were carried over from the script.
Switch over to the Go implementation in the boot-image script.
The package "jq" is no longer needed. Drop it from configs. The sssd service is masked on RHEL 8.8+ because it does not start, see the discussion in: https://access.redhat.com/solutions/7017538
66a7684 to
0545d0b
Compare
|
Only resolved manifest conflicts after auto-merge was enabled, can I get acks again @achilleas-k @supakeen thanks :-) |
Yeah. Cozy winter evening happened.
I wrote the whole framework and implemented two checks and then asked AI to implement the rest. Only checkers in the
checkpackage were generated and reviewed by me, all the rest is my own code. Let me know how you like it. I intentionally did not make any changes to the behavior, this is pure refactoring which will open doors to more features like using YAML to find which tests are missing etc.Also the output is just pure text, but it could be maybe
gotestnow or something that CI understands and can parse for nicer results. But one step after another, I would rather keep it simple for now.Now, I really wanted this to be fast so everything runs in parallel, but logs from all goroutines are collected and presented in a nicer way sequentially so the output is not all messed up. I heavily use what is called context function mocking where the platform functions (Exec, Exists, Grep) can be overriden with a mock via special context value. The key type can be not exported making it impossible to mess around with the functions from other packages.
The hard part is now to integrate this in the CICD. We want to build the binaries for each PR and then upload them somewhere and take it from here. For now, I am settling up with simply running "go build" for each test and then copying the binaries over to the image the same way as we did for the script. If we cache GOCACHE maybe this might not be a bad idea at all.
You can safely run this locally, I have reviewed and tested everything:
The result codes indicate a test which failed as the first one, starting from 10 (1-9 are reserved). These goroutines can fail in random order, so it must not be used for any workarounds it is purely informative.