Skip to content

Conversation

@ironcladlou
Copy link
Contributor

Before this commit, CoreOS metadata was hard-coded in this Git repo and required
manual maintenance to map OCP release versions to compatible AMIs. Now, the
metadata is discovered automatically as part of the general release image
metadata lookup, as the CoreOS metadata is part of the payload as of
openshift/installer#4760.

After this commit, HyperShift will only be compatible with OCP release images
which embed the CoreOS metadata.

@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2021
{
Name: "read-coreos-metadata",
Image: image,
Command: []string{"/usr/bin/cat", "/release-manifests/0000_50_installer_coreos-bootimages.yaml"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this invoked in a shell where we can use wildcards? And maybe regardless, might be worth PRing a comment to the installer source like "renaming this manifest will break a hard-coded path in the HyperShift contro-plane operator: $LINK". Or can we make this lookup robust to installer-side renames by loading manifests and then iterating over them to find the one we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things we use which definitely need more thought:

  • /release-manifests/image-references
  • /release-manifests/0000_50_installer_coreos-bootimages.yaml

In a given release payload, some questions about those files:

  • Are they considered part of the payload "contract" somehow, e.g. by nature of having a well-known location?
  • Are their contents guaranteed to be serializations of some public API types? For example, image-references is a serialized ImageStream, but bootimages is a serialized ConfigMap which embeds a JSON document whose schema I'm not aware of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/release-manifests/image-references is payload contract, e.g. here and here is the CVO requiring that file to exist. The CVO requires manifests to exist in release-manifests/, uses their filenames for ordering, but otherwise isn't opinionated about what they are called, and does not require any manifest-filename consistency between releases.

It's possible that the installer might drop their boot-image ConfigMap, but seems unlikely, and even less likely if you let them know that you're consuming it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense for the OS metadata bits to become part of the contract? Otherwise it's not clear to me how providers would piece together the compatible payload/AMI (for example) versions from the outside in order to specify the AMI explicitly on the HostedCluster. @derekwaynecarr, thoughts here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staebler let’s discuss the above.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract is to call openshift-install coreos print-stream-json. There is no contract for extracting the CoreOS image stream from the release payload directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main questions for me:

  1. Where is the published schema for the coreos metadata JSON?
  2. How should that metadata (conforming to the schema) be made available to users given a release payload image spec (e.g. is there an API call users can make to get the CoreOS metadata for that release, is the the JSON encoded in the image metadata itself, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ask the same questions regarding the ImageStream for component images embedded in the payload. In that case the schema is clear (it's an ImageStream which is public versioned schema), but the "how do users get it" is outstanding

Copy link
Member

@wking wking May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oc adm release info -o json $RELEASE_IMAGE_PULLSPEC will include the image references. I'm fine porting that logic to library-go, for other folks who want to vendor it.

@ironcladlou ironcladlou force-pushed the discover-aws-images branch from 87aa5f6 to 9a18cd1 Compare May 4, 2021 19:13
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

ami, err := r.ImageProvider.Image(hcluster)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to obtain AMI: %w", err)
var ami string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre @sjenning should we be using the value from spec if provided and only discovering if no value was given?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member

@enxebre enxebre May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ami lookup is dictated by nodePool.Spec.Release.Image which also dictates the MCS version used for this nodePool, the MCS dictates the ign payload and therefore the nodePool version. So the ami could even be a status.Field in our MCS resource. This way we'd capture and expose atomically anything that manages nodePool versioning all together in the MCS logic. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense — although this is platform specific, and there's currently no struct type in MCS.Status to express that sort of thing. So maybe something like:

status:
  version: ...
  platform:
    type: aws
    aws:
      ami: ...

Or something... the info is already embedded in NodePool via NodePoolPlatform, so not sure if it makes sense to embed a NodePoolPlatform in MCS.Status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added some TODOs around some possible gaps in the NodePool API here (AWS region, architecture).

I think we should move the status API additions to a followup if that's okay

if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to obtain AMI: %w", err)
var ami string
if arch, foundArch := releaseImage.StreamMetadata.Architectures["x86_64"]; !foundArch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this new logic deserves its own unit testable function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have some tests in progress I can add to the PR, yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did extract some of the logic into a function, but it's so small it doesn't seem worth unit testing at the moment — I do think the reconciliation code itself deserves tests but that would be a much larger refactor (there aren't any tests at all for this controller currently, and I think it needs some restructuring to make that manageable), so I'm not tackling it in this PR

var ami string
if arch, foundArch := releaseImage.StreamMetadata.Architectures["x86_64"]; !foundArch {
return ctrl.Result{}, fmt.Errorf("couldn't find OS metadata for architecture %q", "x64_64")
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we signal all this new errors in a nodePool condition as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it would be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated a new AMIDiscoveryFailed condition to surface this failure

@ironcladlou
Copy link
Contributor Author

/test e2e-aws

Before this commit, CoreOS metadata was hard-coded in this Git repo and required
manual maintenance to map OCP release versions to compatible AMIs. Now, the
metadata is discovered automatically as part of the general release image
metadata lookup, as the CoreOS metadata is part of the payload as of
openshift/installer#4760.

After this commit, HyperShift will only be compatible with OCP release images
which embed the CoreOS metadata.
@ironcladlou ironcladlou force-pushed the discover-aws-images branch from 9a18cd1 to bcd7de4 Compare May 5, 2021 13:40
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@sjenning
Copy link
Contributor

sjenning commented May 5, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2021
@openshift-merge-robot openshift-merge-robot merged commit fec57f3 into openshift:main May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants