-
Notifications
You must be signed in to change notification settings - Fork 69
Start image customization controller #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start image customization controller #208
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sadasu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
provisioning/image_customization.go
Outdated
| container := corev1.Container{ | ||
| Name: "image-customization-controller", | ||
| Image: images.ImageCustomizationController, | ||
| Command: []string{"/image-customization-controller"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably set these two args https://github.com/openshift/image-customization-controller/blob/main/cmd/controller/main.go#L114-L117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently passing in the default as the images-bind-addr. How does the user get to override this value? As an input parameter to CBO (can be set once) or as a config in Provisioning CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
images-publish-addr should be set to the URL of the Service that points at this. That's what appears in the URLs that the controller puts into the PreprovisioningImage Status.
images-bind-addr can be left at the default, as long as the Service points at the default port.
669bd25 to
d6e92e5
Compare
provisioning/image_customization.go
Outdated
| container := corev1.Container{ | ||
| Name: "image-customization-controller", | ||
| Image: images.ImageCustomizationController, | ||
| Command: []string{"/image-customization-controller"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
images-publish-addr should be set to the URL of the Service that points at this. That's what appears in the URLs that the controller puts into the PreprovisioningImage Status.
images-bind-addr can be left at the default, as long as the Service points at the default port.
provisioning/image_customization.go
Outdated
| } | ||
| } else { | ||
| hostIP, err := getPodHostIP(info.Client.CoreV1(), info.Namespace) | ||
| if err == nil && hostIP != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we let this error bubble up so that we will retry, instead of just setting a blank URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry will automatically happen during the next reconcile loop. We might get an error when the metal3 pod is not available temporarily. So, my understanding is that retrying immediately is not going to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get err != nil it's because we failed to do Pods.List(), so that could just be a dropped network connection. No reason to think we'll be triggered to reconcile again.
For the empty hostIP, I agree we could wait for the reconcile. I do wonder whether it's worth the expense of doing a deploy that will definitely fail (this env var is required). I think I will change the controller to make it optional and report the errors in the CRD, rather than having the controller refuse to start and all the CRDs frozen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still to be resolved.
openshift/image-customization-controller#28 is the change to improve the error reporting, so we can continue to handle hostIP == "" like this, but we need to return err in the case where err != nil. It might be easier to pass the ironic IP in to createImageCustomizationContainer() instead of calling out from inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made updates. Hopefully this is what you are looking for.
|
Forgot to mention, we also need to add the |
| }, | ||
| buildSSHKeyEnvVar(info.SSHKey), | ||
| pullSecret, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in openshift/installer#5425 (comment), it looks like we'll need to mount the file /etc/containers/registries.conf (could be as a (read-only) hostPath volume) somewhere in the container, and set the env var REGISTRIES_CONF_PATH to point to it (added in openshift/image-customization-controller#20).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope what I have now is what you are looking for.
719994f to
95c6940
Compare
Commit 95c6940 should take care of this. |
aa36a1d to
b08c2b1
Compare
zaneb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove createInitContainerConfigureCoreOSIPA() now, as that is just adding a similar Ignition file (with no network config) as is built by the image builder. Also we have to get rid of it so that openshift/ironic-image#243 can have an effect.
provisioning/image_customization.go
Outdated
| } | ||
| } else { | ||
| hostIP, err := getPodHostIP(info.Client.CoreV1(), info.Namespace) | ||
| if err == nil && hostIP != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get err != nil it's because we failed to do Pods.List(), so that could just be a dropped network connection. No reason to think we'll be triggered to reconcile again.
For the empty hostIP, I agree we could wait for the reconcile. I do wonder whether it's worth the expense of doing a deploy that will definitely fail (this env var is required). I think I will change the controller to make it optional and report the errors in the CRD, rather than having the controller refuse to start and all the CRDs frozen.
|
/retest |
|
The CRD fix merged but we are still waiting on a CI build. |
|
/retest |
e9937a2 to
fa099bf
Compare
0f9b2a3 to
d4d43d7
Compare
| fmt.Sprintf("http://%s.%s.svc.cluster.local/", | ||
| imageCustomizationService, info.Namespace)}, | ||
| SecurityContext: &corev1.SecurityContext{ | ||
| Privileged: pointer.BoolPtr(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was worth a comment I thought, because we will eventually want to get rid of it when we give i-c-c its own volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
provisioning/image_customization.go
Outdated
| } | ||
| } else { | ||
| hostIP, err := getPodHostIP(info.Client.CoreV1(), info.Namespace) | ||
| if err == nil && hostIP != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still to be resolved.
openshift/image-customization-controller#28 is the change to improve the error reporting, so we can continue to handle hostIP == "" like this, but we need to return err in the case where err != nil. It might be easier to pass the ironic IP in to createImageCustomizationContainer() instead of calling out from inside it.
The machine-os-images container supplies the latest ISO and PXE images for pre-provisioning from within the release payload. Use this instead of downloading preprovisioning images or using the RHEL IPA. Extract the images to the same location (and with the same names) in the image cache volume that we were using previously for IPA images. This change is transparent to the tasks using the images. Continue to run the second-level image cache service and download the Provisioning OS QCOW (if specified). Existing clusters installed prior to 4.10 will contain MachineSets that still rely on the QCOW for provisioning new Machines.
d4d43d7 to
d0a0439
Compare
provisioning/image_customization.go
Outdated
| }, | ||
| { | ||
| Name: ironicBaseUrl, | ||
| Value: "https://" + hostForIP(ironicIP), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we said we were going to pass an empty string if the metal3 host IP couldn't be found?
provisioning/image_customization.go
Outdated
| } | ||
| } | ||
|
|
||
| func hostForIP(ipAddr string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is now only being called once, and we probably want some logic to skip adding the "http://" if it's empty, maybe we should just inline this logic in createImageCustomizationContainer.
d0a0439 to
39a83e3
Compare
|
/lgtm |
Make sure to report any errors that occur while determining Ironic's IP address that needs to be passed to the image-customization-controller.
39a83e3 to
85621b1
Compare
|
/lgtm |
|
/retest-required |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
Tests are blocked by openshift/cluster-cloud-controller-manager-operator#156 - this should be able to merge once that fix is available. |
|
/test e2e-metal-ipi |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
When the Provisioning Network is not being used (ProvisioningNetwork=Disabled or ProvisioningNetwork=Enabled and VirtualMediaViaExternalNetwork = true), the image customization controller deployment uses the host IP of the master on which metal3 deployment is running as the IP address of Ironic.
When the metal3 deployment moves to a different master, we need to make sure that the metal3-image-customization deployment uses this new host IP as Ironic's IP address.
CBO already watches for changes in the Deployment() resource and when the metal3 deployment moves, a reconcile loop would be triggered, Ironic's IP will once again be evaluated and the metal3-image-customization deployment will be updated with this new IP.
Co-Authored-By: Zane Bitter [email protected]