handle externally provisioned hosts without image settings#609
handle externally provisioned hosts without image settings#609dhellmann wants to merge 2 commits intometal3-io:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
145a01a to
9740000
Compare
| hsm.NextState = metal3v1alpha1.StateRegistering | ||
| return actionComplete{} | ||
| } | ||
| if hsm.Host.Spec.Image != nil { |
There was a problem hiding this comment.
If credentials are wrong, and image exists, this code will constantly retry registration, isn't it?
There was a problem hiding this comment.
Yes, that's correct. We have another PR up to introduce similar retry behavior for other failures.
|
/test govet |
9740000 to
47df824
Compare
|
/test-integration |
Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
Update the IronicProvisioner to deal with externally provisioned hosts that have no image settings by recording an error message of our own instead of leaking the Ironic error. When the host does have image settings, ensure they are sent to Ironic before telling it to adopt the host, in case they were not present when the host was registered (this is similar to what we do for provisioning). Addresses metal3-io#608 Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
47df824 to
5652bb3
Compare
|
rebased to resolve merge conflict |
|
/test-integration |
1 similar comment
|
/test-integration |
|
/cc @zaneb |
| if hsm.Host.Spec.Image != nil { | ||
| info.log.Info("Image is set; will retry registration") | ||
| hsm.NextState = metal3v1alpha1.StateRegistering | ||
| return actionComplete{} |
There was a problem hiding this comment.
This requeues with no delay. Since handleRegistration also requeues with no delay after a failure, this means if the credentials are wrong we will be constantly cycling the status in a tight loop.
I think we need to merge #610 instead of returning actionComplete here. With that patch if we continue to return actionFailed then we will retry with an appropriate backoff.
| // error message here avoids exposing the error message | ||
| // from Ironic that talks about fields in Ironic with | ||
| // names the user may not recognize. | ||
| result.ErrorMessage = "Image details missing for externally provisioned server." |
There was a problem hiding this comment.
Is this actually an error? Can't we consider getting to the Manageable state a success? It means we got past Verifying, which is the important thing.
There was a problem hiding this comment.
We can't go any further without the image. So unless we add a new state to the BMH state machine for hosts without images, we have to do something here to keep it from progressing and ending up stuck in a failure when adoption doesn't work.
Maybe we do need another state?
There was a problem hiding this comment.
We're in the ExternallyProvisioned state... what further stuff do we have to do? Getting to Manageable means we can control the power, doesn't it?
There was a problem hiding this comment.
No, we have to adopt the host (see line 921).
There was a problem hiding this comment.
I've just noticed that we allow the Host to freely switch between the ExternallyProvisioned and Ready states without deprovisioning it first. That's likely a bug.
We have to adopt the host if we are in the Provisioned state. And we might want to adopt the host if we know the image so that Host can transition directly from ExternallyProvisioned->Provisioned without cleaning/inspection (although we don't support that today... we just go from ExternallyProvisioned->Ready without cleaning lol). But I'm not aware of any reason now or in the future that we would have to adopt an externally-provisioned host when we haven't been told what image (if any) is expected to be running on it.
There was a problem hiding this comment.
I've just noticed that we allow the Host to freely switch between the ExternallyProvisioned and Ready states without deprovisioning it first. That's likely a bug.
Fun. Maybe? I could go either way.
We have to adopt the host if we are in the Provisioned state. And we might want to adopt the host if we know the image so that Host can transition directly from ExternallyProvisioned->Provisioned without cleaning/inspection (although we don't support that today... we just go from ExternallyProvisioned->Ready without cleaning lol). But I'm not aware of any reason now or in the future that we would have to adopt an externally-provisioned host when we haven't been told what image (if any) is expected to be running on it.
According to @dtantsur or @juliakreger when I was writing that code, Ironic only monitors the power state of hosts that are adopted. Maybe I'm mis-remembering that, though? In any case, adoption today requires the image, and that's why this fix is preventing adoption until the image is set.
There was a problem hiding this comment.
According to @dtantsur or @juliakreger when I was writing that code, Ironic only monitors the power state of hosts that are adopted.
Ah, I hadn't heard that. It's possible - adopting moves it to the active state (i.e. provisioned), so that shouldn't be required per se, but it may be we need to get to the available state before we could manage the power. If you don't have an image, the way to get there is via cleaning, which we don't want to do on an externally provisioned host (until it goes back to Ready!).
If that's the case, then I can see why we would treat this as an error, since it means we can't change the power state while externally provisioned.
It's tempting to just pass some bogus image, but there's no way to replace it with the real one if we get it later without dropping the Ironic DB.
|
/hold We think #610 may fix this issue, and it will certainly change what we need to do in this PR. |
|
@dhellmann #610 has merged, what needs to happen with this PR? |
I need to rework it. I want to merge #650 before doing anything else with this repo, though. |
|
@dhellmann: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Update the IronicProvisioner to deal with externally provisioned hosts that
have no image settings by recording an error message of our own.
When the host does have image settings, ensure they are sent to Ironic
before telling it to adopt the host, in case they were not present when the
host was registered (this is similar to what we do for provisioning).
Addresses #608