Skip to content

Implement an Unmanaged (discovered) state#563

Closed
zaneb wants to merge 4 commits intometal3-io:masterfrom
zaneb:unmanaged-state
Closed

Implement an Unmanaged (discovered) state#563
zaneb wants to merge 4 commits intometal3-io:masterfrom
zaneb:unmanaged-state

Conversation

@zaneb
Copy link
Copy Markdown
Member

@zaneb zaneb commented Jun 25, 2020

If no BMC credentials are provided, move the host to a new Unmanaged state and change the operational status to discovered, rather than remaining without a provisioning state and in an operational error.

This has always been envisaged in the state machine diagram (with the name Discovered rather than Unmanaged), but was not previously implemented.

zaneb added 2 commits June 24, 2020 16:21
This state isn't actually implemented in the code, so before we
implement it rename it to "Unmanaged". This reflects the fact that what
this state really means is that the Host has been created without
attempting to provide credentials. That may have transpired through a
discovery process or some other mechanism.
Bring the code into line with the state machine documentation, which
says that once the Host is reconciled it will immediately move into one
of the states (registering or unmanaged) and not remain with an empty
provisioning state. Since the Unmanaged state has not been implemented
yet, move unconditionally to Registering.

Previously, we waited until the the BMC credentials had at least once
been deemed worthy of trying (present, correctly formatted) before the
Host would move to the Registering state.
@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 25, 2020
@zaneb zaneb force-pushed the unmanaged-state branch from fc4e828 to 3eda2d9 Compare June 25, 2020 02:26
@zaneb
Copy link
Copy Markdown
Member Author

zaneb commented Jun 25, 2020

/test-integration

@zaneb zaneb requested a review from dhellmann June 25, 2020 02:34
bmcCreds = &bmc.Credentials{}
default:
bmcCreds, bmcCredsSecret, err = r.buildAndValidateBMCCredentials(request, host)
if err != nil || bmcCreds == nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have a default username without a password? does it make sense to check if there are no credentials or if user or password are empty?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what buildAndValidateBMCCredentials() does.

@maelk
Copy link
Copy Markdown
Member

maelk commented Jun 25, 2020

/test-integration


A Discovered host is missing either the BMC address or credentials
secret name, and does not have enough information to access the BMC
An Unmanaged host is missing both the BMC address and credentials
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be both? Could it be either?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working assumption:

  • If you didn't provide either, you meant to create an unmanaged host
  • If you provided one but not the other, linked to a missing credential secret, or linked to an incorrectly formatted secret, you screwed up and should see big error messages in the UI.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, it's academic anyway because both are required fields within the BMC struct. So you can only specify both or neither.

Comment thread pkg/controller/baremetalhost/baremetalhost_controller.go

A Discovered host is missing either the BMC address or credentials
secret name, and does not have enough information to access the BMC
An Unmanaged host is missing both the BMC address and credentials
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working assumption:

  • If you didn't provide either, you meant to create an unmanaged host
  • If you provided one but not the other, linked to a missing credential secret, or linked to an incorrectly formatted secret, you screwed up and should see big error messages in the UI.

Comment thread pkg/controller/baremetalhost/baremetalhost_controller.go
Comment thread pkg/controller/baremetalhost/host_state_machine.go Outdated
zaneb added 2 commits June 25, 2020 14:19
Newly-created Hosts enter the Unmanaged state if no BMC details are
provided in the resource Spec.

As soon as details are provided - regardless of whether they are
complete or correctly formatter - the Host will return to the
Registering state. Even if all of the details are later removed, the
Host will not return to the Unmanaged state.

This is in line with how this state is documented in the state machine
diagram, but it was never previously implemented.
When a host is discovered or otherwise created without any BMC data, it now
goes into an Unmanaged state, in which buildAndValidateBMCCredentials()
is not called.

Once a host leaves the unmanaged state (i.e. the BMC address and/or
credentials have been supplied at least once), any time the BMC address
and/or credentials are missing it is due to a configuration error, not
to having newly-discovered a host.

Since the unmanaged state handles newly-discovered hosts, in the error
handling for buildAndValidateBMCCredentials we can now flag an error if
any of the data is missing. This prevents transitions like a Host going
back into a "discovered" operational status despite having been
provisioned already before the credentials were removed. That should be,
and now is, treated as an error.
@zaneb zaneb force-pushed the unmanaged-state branch from 3eda2d9 to 345e294 Compare June 25, 2020 18:20
@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 25, 2020
@zaneb
Copy link
Copy Markdown
Member Author

zaneb commented Jun 25, 2020

/test-integration


// HasBMCDetails returns true if the BMC details are set
func (host *BareMetalHost) HasBMCDetails() bool {
return host.Spec.BMC.Address != "" || host.Spec.BMC.CredentialsName != ""
Copy link
Copy Markdown
Member

@n1r1 n1r1 Jun 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this assumes that host.spec.BMC exists. is this a valid assumption?

@n1r1
Copy link
Copy Markdown
Member

n1r1 commented Jun 28, 2020

In some way, this is a breaking change, and api.md should be updated to reflect that.
Few things that caught my eyes there - the online flag, which will trigger power change only if there are BMC credentials (or if not in unmanaged state);
The triggering-provisioning section should relate to the new state;
and probably some other fields like poweredOn, etc.

We also need to consider the consequences on other components that already have some assumptions on the existing API

@beekhof
Copy link
Copy Markdown

beekhof commented Jun 30, 2020

Have we considered a webhook at all?
That seems like a useful addition to what is being proposed here.

We could use one to, for example, look to see if the BMH is unmanaged and reject any changes to spec.Online or prevent anything from setting the reboot annotation. That way the code can assume that some of these problematic states can't occur, even from the oc command line.

And the caller gets immediate and useful feedback

@dhellmann
Copy link
Copy Markdown
Member

In some way, this is a breaking change, and api.md should be updated to reflect that.
Few things that caught my eyes there - the online flag, which will trigger power change only if there are BMC credentials (or if not in unmanaged state);
The triggering-provisioning section should relate to the new state;
and probably some other fields like poweredOn, etc.

I will propose a follow-up to add some comments about requiring BMC credentials for those features.

We also need to consider the consequences on other components that already have some assumptions on the existing API

@kirankt this is the comment I mentioned this morning.

@n1r1 are you thinking of something specific? The cluster-api-provider-* clients?

@dhellmann
Copy link
Copy Markdown
Member

I have verified locally that this PR works.

Another approach we could take is to allow hosts to be in the externally provisioned state without any BMC credentials. We wouldn't need any new states for that, but we would rearrange the existing states and hosts would stay in the "None" state if they weren't externally provisioned and didn't have credentials. Or maybe they would go Created->Registering->Error?

If we did that, we would have to add a status field to reflect whether the host could be managed or not, based on whether the BMC credentials are present and valid.

I'm not sure that approach gives us a better user experience, over adding the new steady state early. And it would complicate the rest of the controller code, because we would have to check any time we wanted to do something that required power control to see if we could.

We should also consider how this applies to the discovery proposal. As it stands, this change doesn't allow a host to move from Unmanaged to Provisioned without adding the BMC details, but that's a goal of that other proposal.

@dhellmann
Copy link
Copy Markdown
Member

Zane's PR needed to be rebased and I don't have access to his GitHub repo to push there, so I opened #569 instead.

@dhellmann dhellmann closed this Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants