Improve communications between controller and provisioner#761
Improve communications between controller and provisioner#761metal3-io-bot merged 15 commits intometal3-io:masterfrom
Conversation
|
/test-integration |
51a7930 to
5663434
Compare
|
/test-integration |
|
/cc @dhellmann |
Add a set of functions that lend semantic meaning to the various Result values that can be returned - some of which result in identical Result structures being returned to the caller. This is the first step toward improving the provisioner API to allow the controller more insight into how it should respond to events in the provisioner. The six possible events are: * A conflict that should be retried with a delay * A change to the Host Status that requires it to be written back to the k8s API * Waiting for an ongoing process * Successful completion of the current operation * A failure of the current operation * A transient error (e.g. network connection failure)
Denote instances where we are returning due to a 409 Conflict in Ironic by calling the retryAfterDelay() function to get a Result. Note that in some cases this introduces a delay where previously there was none. Delaying after a 409 response is probably always the right thing to do.
5663434 to
96d691c
Compare
|
/test-integration |
andfasano
left a comment
There was a problem hiding this comment.
Apart a couple of minor points/questions, changes look good to me.
I think this approach is much more neat and readable - as the semantic functions express better the intention in the code, and definitely helps in promoting a better separation of the host update logic outside of the Provisioner.
| provisionerFactory := func(host metal3iov1alpha1.BareMetalHost, bmcCreds bmc.Credentials, publish provisioner.EventPublisher) (provisioner.Provisioner, error) { | ||
| isUnmanaged := host.Spec.ExternallyProvisioned && !host.HasBMCDetails() | ||
|
|
||
| hostCopy := host.DeepCopy() |
There was a problem hiding this comment.
What's the requirement for this copy?
There was a problem hiding this comment.
It's enforcing that no provisioner ever gets to make changes to the Host, since it only gets a copy and the controller never sees the copy.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano, zaneb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Denote instances where we are returning due to data in the Host's Status structure being updated by calling the hostUpdated() function to get a Result.
The provisioner interface currently requires provisioners to write to the host's Status in order to set the ID. This is undesirable. Return the provisioner's ID for the host from ValidateManagementAccess() and write it to the Status in the controller.
The provisioner interface currently requires provisioners to write to the host's Status in order to set the power status. This is undesirable. Return the power status (if known) for the host from UpdateHardwareState(), rather than a Result structure from which only the Dirty flag was used. Write the updated power state to the Status in the controller.
The fixtureProvisioner object gets recreated for each Reconcile() call, so it cannot be used to store state that is used to make different things happen on each reconciliation. Some of this state was instead being stored in the host Status, which could potentially confound some of the tests: only the controller should modify the status, and it is the thing we are trying to test. Instead, create a Fixture structure that contains persistent state for a particular Host and the provisioner Factory to create a provisioner that has access to that state. Stop writing to the Host's Status from the fixture provisioner.
There is no longer any reason for provisioners to modify the BareMetalHost object itself; they should communicate any changes explicitly through return values from the Provisioner API. To prevent this starting to happen again in future, pass only a copy of the BareMetalHost in the provisioner factory, so that no changes made by the provisioner will ever make their way back to the controller nor be written to the k8s API.
This function is now called on basically every reconcile, so don't log information about things that haven't changed.
When the provisioner could write to the Host's Status subresource and expect it to be persisted, we had to always treat a Dirty flag returned from the provisioner as indicating that the Status was dirty and required a write. (Oddly this meant the Dirty flag was treated with almost the inverse of its intended meaning: when it was returned true the Status was rarely actually dirty, while when it was false the action was complete which always requires a write.) Now that provisioners are no longer allowed to write to the Status, we only need to request a write when the controller has updated the Status. Add a new actionUpdate to cover the couple of cases that are not already covered by actionComplete, and those where ClearError() returns dirty. The actionUpdate type comprises actionContinue so that type checks for the latter handle it correctly. Make actionContinue return false from the Dirty() method, meaning that it will not cause a write to the k8s API. The actionContinueNoWrite type is removed, since it would now be identical to actionContinue.
All of the action<StateName>() methods of the reconciler are the handlers used by the host state machine to carry out the action in the current provisioning state, except for actionRegistering(). actionRegistering() is called in every state (by the state machine's ensureRegistered() method) prior to calling the action method for the current state. Rename actionRegistering() to registerHost() to eliminate the implication that it is an action specific to the Registering state. Simplify the return handling by returning nil when validating management access succeeded and there were no changes required to the Status, since this is what we want to return from ensureRegistered() (indicating no need to return early before executing the state's action method). This will occur both in general operation when there is no re-registration required, and also potentially at the conclusion of a re-registration where there was no credentials change and errors had already been cleared (by a previous invocation that returned actionUpdate). At the conclusion of a registration where changes need to be written to the Status, e.g. by clearing a previous error state or updating the known good credentials, actionComplete is returned. This is later translated to actionUpdate by ensureRegistered(). This enables that scenario to be distinguished from that of a change needing to be written while registration is still in progress, which results in actionUpdate. This means there is no longer a need in ensureRegistered() to guess when the registerHost() method might have resulted in a change to the Status.
96d691c to
98d7131
Compare
|
/test-integration |
|
/lgtm |
Use semantic functions (
retryAfterDelay(),operationContinuing(),operationComplete(),operationFailed(),transientError()) to generate return values from ironic provisioner. This is a first step toward changing theprovisioner.Resulttype to include richer information (e.g. to be able to distinguish between a request to retry after a delay and the operation simply continuing).Stop writing to the BareMetalHost Status subresource in the provisioner, and require all changes to be communicated back to the controller via return values from the Provisioner interface.
Only update the BareMetalHost Status subresource when something has changed, instead of on ~every reconcile.