-
Notifications
You must be signed in to change notification settings - Fork 462
[MCO-126] Have capability to consume new format base OS image #2939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MCO-126] Have capability to consume new format base OS image #2939
Conversation
[mcbs] Fast forward to master
This is *mainly* to validate that openshift/release#24225 worked. But, this code may be useful as a sanity check going forward. See also coreos/rpm-ostree#3251 (I also may try to expose e.g. `ex-container` as a feature flag that we can query instead of version-parsing)
…tree-version [mcbs] Validate rpm-ostree version is new enough
IsCoreOSVariant and compareOSImageURL have already been called or equivalent checks have been performed for all cases updateOS is called Since updateOS no longer requires any members of Daemon, it can be made a helper function instead of a method on Daemon
Call IsCoreOSVariant once in applyOSChanges instead of in every helper function
The comment on the function says it's probably unnecessary, and it adds unnecessary complexity to logic that must be maintained in two separate OS update paths (one in update() and one in checkStateOnFirstRun())
Certain helper methods should only be called on CoreOS, and it is more reliable to type check this than rely on method preconditions
daemon.go/update.go: various cleanup surrounding OS updates
…se-1 [layering] update mcbs branch with master
Add e2e test that 1. creates image stream and pushes build to that image stream 2. uses that build with rpm-ostree rebase 3. successfully reboots into that image Closes https://issues.redhat.com/browse/MCO-127
Otherwise e2e tests fail with "panic: test timed out after 1h30m0s"
Create PoC for booting from in-cluster built image
In preparation of CoreOS Layering, refactor our various locations of OS updates into 1 function. Left some investigations for later inline.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Do you think we could figure out whether we can drop the bootstrapping and pivot cases now? I think that would allow clearer code in |
|
I think that would also make it easier to add some type safety for layered vs non-layered similar to 7c5a1e6 |
| if err = addExtensionsRepo(osImageContentDir); err != nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you moved this here? It's unneeded by the pivot and bootstrap paths, correct? And do they clean it up? I only see the cleanup in applyOsChanges. +1 for removing pivot and bootstrap if possible 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just put it here as a placeholder since we no longer ship the extensions repo in the base OS. I can always move it into a separate function and add a check.
The other paths don't properly do a full OS update so I thought it wasn't really a big issue either way, since if it isn't used, it is just a no-op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "bootstrap" path isn't really bootstrapping either, so I think that behaviour was technically wrong. Again, I think it isn't used, so I can try removing it
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks so much for starting this!
Thinking about this a bit more, perhaps instead of inspecting the container dynamically, it makes sense to use a new field in MachineConfig (osContainer instead of osImageURL?) and in the configmap.
Then we know from quite early on which path we'll take. We can make everything more "type safe" too...something like:
type OSUpdateSource struct {
old *string
new *docker.ImageReference
}
(Taking the opportunity to use a proper type for a container image reference instead of string)
In Rust of course this would be a nice enum
enum OSUpdateSource {
Old(String)
New(ImageReference)
}
which means the (IMO invalid) states of "no source" and "both sources" are omitted.
Though...I just wrote that but I am thinking actually in order to "ratchet" this into place, we probably actually need to have a bit of time (hopefully not long) where we ship both format containers. And so we may need to at least transiently represent and handle the "both" case.
OTOH, an advantage of dynamic inspection is we wouldn't need that ratchet, we could just do the swap, then delete the code in the MCO doing the inspection and handling the old format after.
So...dunno. I am good either way I guess.
This could definitely work and potentially makes the transition smoother (or at least, gives us the flexibility to fall back). I just did it the current way in accordance with the card, and was mostly meant as a start for in-cluster MCD testing. By no means final. Just thinking through what the other format would look like, the MC would have potentially 3 fields for now?
Wanted to move towards 3 anyways so I can draft up a separate PR (or on top of this one) to see what that might look like |
|
Just to make sure I have this right:
|
Hmm, interesting. I think some of the current strawman designs call out for e.g. reworking the node controller and annotations to use the image as source of truth (e.g. instead of OTOH, maybe it actually is lower risk to basically just retain the node controller and pools etc. as is, and just add a new field. What confuses me a bit is it seems like we'd be making things a bit circular becuse the MC would be both an input to generating the image, and be changed when an image is output? Maybe we can address that by moving the |
| func ExtractAndUpdateOS(newOSImageURL string) (osImageContentDir string, err error) { | ||
| if isLayeredUpdate(newOSImageURL) { | ||
| // support new flow here | ||
| // Not sure if these commands are enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that should be it!
That said...there is a whole interesting discussion here because what we could actually do is rebase once to docker-registry.default.svc:5000/machine-config-operator/coreos-layered:latest and then just run rpm-ostree upgrade to pull the new image.
OTOH, most of everything inside the platform is all using explicit @sha256 digests for total predictability. So doing a rebase each time here is fine.
|
Looks like |
Right; this is just handling old format vs new format.
This though is a much more nuanced part and gets into exactly how we do the handoff between the rendered config, the image, and the MCD. The way I was thinking of this, if the MCD uses an image as source of truth, then the MCD stops reading MachineConfig from the API. Instead it just pulls out Another way to say this is it's having the MCD handle the stuff not handled by But...we could perhaps say let's try to get the MCD entirely out of that business, and we should have a way to ship kernel arguments in the container image and for the ostree stack to handle that. (It makes sense, needs design though) |
|
OK so the build didn't complete, but we don't know why. I can't find it though in the must-gather. |
|
Funnily enough we had a quick chat today that went through basically the same thought process. Will be trying to draw some diagrams on what this flow could look like. I am definitely not against changing this to a higher level field. This just made it such that we don't yet alter the regular workflow. Will be looking at tests after revisiting based on discussions. |
|
just for safety adding a hold here :) /hold |
|
OK sorry, I had lost track of this one. I think we can probably proceed with this? |
|
I think we might need to drop and/or combine some of this in favor of what's on @jkyros's branch. But the consolidate update paths is probably still helpful |
|
@yuqi-zhang: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@yuqi-zhang: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Sorry I also lost track of this. I think this is probably outdated at this point. If there are any helpful parts of this we'd like to extract out specifically, I can look at doing so. Otherwise I think we should close this in favour of John's updated work. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
An implementation of https://issues.redhat.com/browse/MCO-126. This should allow dual support for existing in-cluster OS image and new oci-format base os images.
To test, scale down CVO, then:
oc -n openshift-machine-config-operator edit configmap/machine-config-osimageurlAnd switch to Colin's base image in openshift/os#657 (comment) :
osImageURL: registry.ci.openshift.org/coreos/walters-rhcos-ostreecontainer@sha256:3f57a0b046c023f837ae1c6d00f28e44a2a3c6201df556630698da29c942b2c8And the MCPs should update to something like:
Somewhat WIP status due to some uncertainty, comments inline