-
Notifications
You must be signed in to change notification settings - Fork 462
Support installing extensions and rework OS update #1941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support installing extensions and rework OS update #1941
Conversation
|
Note that with this PR we don't need to do |
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use writeFileAtomically
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: errors start lowercase (glog uppercase)
|
/retest |
pkg/daemon/rpm-ostree.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to re-introduce openshift/pivot#51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went skopeo route to save pulling container image as it is required by podman inspect. Is skopeo inspect more expensive than doing podman pull containerImg && podman inspect ?
I was also wondering if we can just use https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/rpm-ostree.go#L257,L280 to get checksum and rebase. It works but not sure if we hit issue in some situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually have the containers/image stack vendored into the MCO for other reasons, so we could just directly do that from our process. Another alternative looks like oc image info -o json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could try to figure out if there's a way to get skopeo to not try to enumerate all the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at skopeo code, it doesn't seem like we can ask skopeo to skip looking for tags(have also asked on skopeo irc channel to get a second opinion)
Not very comfortable with us maintaining the code to inspect images, feels like it is better to use already tested tool oc image info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with skopeo team it appears skipping to fetch RepoTags is not possible right now. With inspiration from skopeo inspect code, wrote an independent program which uses containers/image stack(already vendored in MCO) https://github.com/sinnykumari/container-inspect/blob/master/container_inspect.go and doesn't fetch RepoTags . If this looks ok I will integrate this in MCO, otherwise we can go for oc image info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed changes to use containers/image stack directly for inspecting image.
pkg/daemon/rpm-ostree.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we missing a call to the extract function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully I understood your question correctly.
We are now not mounting container image. https://github.com/openshift/machine-config-operator/pull/1941/files#diff-06961b075f1753956d802ba954d2cfb5R251 is where we extract OS container content in /run/mco-machine-os-content/ and use same for rpm-ostree rebase.
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing bc the if stmt right before this is also checking for mcDiff.osUpdate. Why not nest this inside the above or not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel moving this with earlier section is going to be tricky and may lead to un-necessary bugs. Both section are doing different things and right order is really important here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if is not needed here as we also compare OSImageURL in updateOS()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower case
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ this comment!
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this file has so many helper funcs it would be easier to move them into their own file for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. will look at moving helper function in a separate file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this is one of the things which we can discuss collectively to cleanup and refactor MCO code.
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this should be 1 func call that evals this if inside and then calls extract, addExtensions & updateOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, there is scope of moving OS update/layering section into a function.
26b51d2 to
0ae69c8
Compare
|
Thank you all for reviewing! |
pkg/daemon/daemon.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this comment down to the dn.finalizeAndReboot now I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm, we will return back here also when updateOS() fails?
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why gpgcheck=0 here? Shouldn't all the RPMs we ship via extensions be signed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch. Don't know why I kept it like that, most likely copy paste error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had an offline discussion, but putting a comment here for future reference:
It is possible that some packages may not be signed, especially when dealing with nightly images. We are anyway shipping extensions packages in machine-os-content(which is source of truth for us). So it should be ok to install them with gpgcheck off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that it's ok to turn off signature verification. It seems like it leaves open a pretty significant security hole that attackers could exploit to run untrusted code on the host. While we check the machine-os-content image signature, we don't have any checks on the integrity of the files in the extensions directory ensuring that the RPMs that get installed are same ones shipped in the machine-os-image without alteration. There are likely other security gaps that could be enabled by installing unverified RPMs.
At the very least, I'm pretty sure many customers in highly-regulated industries require every package to be signed and validated to run on a host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we check the machine-os-content image signature, we don't have any checks on the integrity of the files in the extensions directory ensuring that the RPMs that get installed are same ones shipped in the machine-os-image without alteration.
We do, because we pull-by-digest and the container stack verifies that the digest matches.
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: repoContent
6cdcc06 to
a714530
Compare
|
/retest |
a714530 to
990e561
Compare
|
/test cluster-bootimages |
2123350 to
33c3ce8
Compare
|
/retest |
|
/test cluster-bootimages |
|
Tests seems to be flaky |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reading only the image inspect part.) ACK, just trivial nits.
pkg/daemon/container-inspect.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t the file be named image-inspect.go?
pkg/daemon/container-inspect.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With docker.ParseReference it is no longer very URL-like; and I would expect this function to handle the "//" prefix instead of callers having to care. Ideally, this should “convert an image reference”, or “create an ImageSource from an image reference”.
pkg/daemon/container-inspect.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely non-blocking: newDockerImageSource maybe; the “parse” name was used in context of a fairly general CLI that is not applicable in this call tree.
pkg/daemon/container-inspect.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller can pass this down instead of having two copies that can get out of sync.
go.mod
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, container-runtime-config (and openshift/runtime-utils) still use the old version. That should probably be updated to decrease duplication… but it’s also rather out of scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we can clean this up separately.
pkg/daemon/rpm-ostree.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere, I’d expect the "//" syntax to be hidden in the implementation.
Thanks for the review, pushed changes addressing the feedback. |
|
/test cluster-bootimages |
26c0487 to
966d8b8
Compare
Vendor latest containers/image v5.5.1 which contains utilities needed to inspect container image in MCO.
Also rename pkg/daemon/container-inspect.go to pkg/daemon/image-inspect.go to fit naming better with what it does.
Also, explicitly delete the extracted OS image content once done
0cd99b9 to
5b44766
Compare
|
Rebased to get e2e-gcp-op fixes from #1969 included |
|
needs lgtm blessing :) |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, runcom, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
e2e-aws is failing consistently today across various PRs. |
The PR is already approved by the cutoff, we usually don't override tests just to merge, they usually end up merging over the weekend. This PR hasn't passed this test since its initial run on Tuesday so I think we should just let the bot retest so it can pass. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@sinnykumari: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
With the introduction of extensions and reworking of OS updates in openshift#1941, we now run `oc/podman` directly in the MCD to fetch the OS image. Thus we need to inject the proxy env vars into the container via the daemonset definition, otherwise image pulls will fail. Signed-off-by: Yu Qi Zhang <[email protected]>
Earlier to perform OS update we were pulling OS image
and mounting the content using podman. We were performing
OS update in host context because of selinux constraints
on mounted container.
Also, for rt-kernel switch we were pulling OS image again.
With coreos extensions support, we require extensions rpm
which is available in os container.
We tried different approach to solve problems
like minimizing container image pull, host/container
context switch, selinux permission on mounted
container and rpm-ostreed behavior.
See #1850
Finally, using
oc image extractand skopeo to inspect containerimage solves our problems. With this we are also getting
rid of using mco-pivot systemd service to rebase OS.