-
Notifications
You must be signed in to change notification settings - Fork 427
Dynamically discover release image OS metadata #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically discover release image OS metadata #205
Conversation
|
/test e2e-aws |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ironcladlou The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| { | ||
| Name: "read-coreos-metadata", | ||
| Image: image, | ||
| Command: []string{"/usr/bin/cat", "/release-manifests/0000_50_installer_coreos-bootimages.yaml"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this invoked in a shell where we can use wildcards? And maybe regardless, might be worth PRing a comment to the installer source like "renaming this manifest will break a hard-coded path in the HyperShift contro-plane operator: $LINK". Or can we make this lookup robust to installer-side renames by loading manifests and then iterating over them to find the one we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of things we use which definitely need more thought:
- /release-manifests/image-references
- /release-manifests/0000_50_installer_coreos-bootimages.yaml
In a given release payload, some questions about those files:
- Are they considered part of the payload "contract" somehow, e.g. by nature of having a well-known location?
- Are their contents guaranteed to be serializations of some public API types? For example, image-references is a serialized
ImageStream, but bootimages is a serializedConfigMapwhich embeds a JSON document whose schema I'm not aware of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/release-manifests/image-references is payload contract, e.g. here and here is the CVO requiring that file to exist. The CVO requires manifests to exist in release-manifests/, uses their filenames for ordering, but otherwise isn't opinionated about what they are called, and does not require any manifest-filename consistency between releases.
It's possible that the installer might drop their boot-image ConfigMap, but seems unlikely, and even less likely if you let them know that you're consuming it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense for the OS metadata bits to become part of the contract? Otherwise it's not clear to me how providers would piece together the compatible payload/AMI (for example) versions from the outside in order to specify the AMI explicitly on the HostedCluster. @derekwaynecarr, thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staebler let’s discuss the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract is to call openshift-install coreos print-stream-json. There is no contract for extracting the CoreOS image stream from the release payload directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main questions for me:
- Where is the published schema for the coreos metadata JSON?
- How should that metadata (conforming to the schema) be made available to users given a release payload image spec (e.g. is there an API call users can make to get the CoreOS metadata for that release, is the the JSON encoded in the image metadata itself, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ask the same questions regarding the ImageStream for component images embedded in the payload. In that case the schema is clear (it's an ImageStream which is public versioned schema), but the "how do users get it" is outstanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oc adm release info -o json $RELEASE_IMAGE_PULLSPEC will include the image references. I'm fine porting that logic to library-go, for other folks who want to vendor it.
87aa5f6 to
9a18cd1
Compare
|
/test e2e-aws |
| ami, err := r.ImageProvider.Image(hcluster) | ||
| if err != nil { | ||
| return ctrl.Result{}, fmt.Errorf("failed to obtain AMI: %w", err) | ||
| var ami string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ami lookup is dictated by nodePool.Spec.Release.Image which also dictates the MCS version used for this nodePool, the MCS dictates the ign payload and therefore the nodePool version. So the ami could even be a status.Field in our MCS resource. This way we'd capture and expose atomically anything that manages nodePool versioning all together in the MCS logic. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense — although this is platform specific, and there's currently no struct type in MCS.Status to express that sort of thing. So maybe something like:
status:
version: ...
platform:
type: aws
aws:
ami: ...Or something... the info is already embedded in NodePool via NodePoolPlatform, so not sure if it makes sense to embed a NodePoolPlatform in MCS.Status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added some TODOs around some possible gaps in the NodePool API here (AWS region, architecture).
I think we should move the status API additions to a followup if that's okay
| if err != nil { | ||
| return ctrl.Result{}, fmt.Errorf("failed to obtain AMI: %w", err) | ||
| var ami string | ||
| if arch, foundArch := releaseImage.StreamMetadata.Architectures["x86_64"]; !foundArch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this new logic deserves its own unit testable function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have some tests in progress I can add to the PR, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did extract some of the logic into a function, but it's so small it doesn't seem worth unit testing at the moment — I do think the reconciliation code itself deserves tests but that would be a much larger refactor (there aren't any tests at all for this controller currently, and I think it needs some restructuring to make that manageable), so I'm not tackling it in this PR
| var ami string | ||
| if arch, foundArch := releaseImage.StreamMetadata.Architectures["x86_64"]; !foundArch { | ||
| return ctrl.Result{}, fmt.Errorf("couldn't find OS metadata for architecture %q", "x64_64") | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we signal all this new errors in a nodePool condition as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like it would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I incorporated a new AMIDiscoveryFailed condition to surface this failure
|
/test e2e-aws |
Before this commit, CoreOS metadata was hard-coded in this Git repo and required manual maintenance to map OCP release versions to compatible AMIs. Now, the metadata is discovered automatically as part of the general release image metadata lookup, as the CoreOS metadata is part of the payload as of openshift/installer#4760. After this commit, HyperShift will only be compatible with OCP release images which embed the CoreOS metadata.
9a18cd1 to
bcd7de4
Compare
|
/test e2e-aws |
|
/lgtm |
Before this commit, CoreOS metadata was hard-coded in this Git repo and required
manual maintenance to map OCP release versions to compatible AMIs. Now, the
metadata is discovered automatically as part of the general release image
metadata lookup, as the CoreOS metadata is part of the payload as of
openshift/installer#4760.
After this commit, HyperShift will only be compatible with OCP release images
which embed the CoreOS metadata.