Skip to content

Add detached annotation proposal#168

Merged
metal3-io-bot merged 1 commit intometal3-io:masterfrom
hardys:maintenance_mode
May 12, 2021
Merged

Add detached annotation proposal#168
metal3-io-bot merged 1 commit intometal3-io:masterfrom
hardys:maintenance_mode

Conversation

@hardys
Copy link
Copy Markdown
Member

@hardys hardys commented Mar 11, 2021

First draft of a proposal to enable control of maintenance mode via the BMH API.

@metal3-io-bot metal3-io-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 11, 2021
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 11, 2021

/cc @dhellmann @dtantsur @avishayt

@metal3-io-bot
Copy link
Copy Markdown
Contributor

@hardys: GitHub didn't allow me to request PR reviews from the following users: avishayt.

Note that only metal3-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @dhellmann @dtantsur @avishayt

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

```

TODO - we don't expose any reason here, is it enough to have
a standard reason string e.g "disabled by BaremetalHost maintenance field"?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the caller knows why the host is in maintenance mode. We could ask for a string reason, but I'm not sure what we would do with it. A UI or something could always use an annotation to store a message. Unless maybe we would show the reason as part of the default kubectl output for hosts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying Ironic API does accept a reason, but I guess the user isn't exposed to that - I was thinking maybe generating an event with the reason could be useful, but it could also just be a generic event saying the host got put into maintenance mode (which simplifies the API as it's just a boolean flag)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, I wasn't really concerned with the value in Ironic so much as what we ask for or show on the host resource. Let's leave the reason out for now, under the theory that it's easier to add the field later if we decide we need it.

a standard reason string e.g "disabled by BaremetalHost maintenance field"?

TODO - what happens if a BMH is deleted with `maintenance: true` - do we
skip deprovisioning or should that be an independent annotation/field?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look at what Ironic does and see if we want that to be our default behavior.

@dtantsur thoughts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From local testing I'm fairly sure that when you've set the maintenance flag you can just delete the node without any deprovisioning, so aligning with that behavior seems reasonable to me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That matches what I remember.

@hardys hardys force-pushed the maintenance_mode branch from 67f25c0 to 0e1b364 Compare March 11, 2021 16:23

## Motivation

In a multi-tier deployment where one cluster deploys another, the "parent"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general - transfer ownership between clusters, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been some discussions about moving hosts between clusters as part of auto-scaling, but there are some different implications when doing that (we would absolutely need to deprovision those hosts as part of the transfer, to avoid leaking data from one cluster to another, for example). So, I think for this proposal let's stay focused on the multi-tier deployment case.

Copy link
Copy Markdown
Member

@kashifest kashifest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this. This will be really helpful. I think also as a non-goal it should be made clear that when ironic is putting a node on maintenance mode, we won't want to sync that back to BMH resource, right or would we?
/cc @maelk

@metal3-io-bot metal3-io-bot requested a review from maelk March 12, 2021 06:53
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 12, 2021

+1 for this. This will be really helpful. I think also as a non-goal it should be made clear that when ironic is putting a node on maintenance mode, we won't want to sync that back to BMH resource, right or would we?

This is a good point - I was thinking that the BMO sync would remain active (since reconciliation is not paused), but since maintenance mode flag in Ironic pauses all periodic tasks inside Ironic, it's unlikely anything in the BMH will change from Ironic->BMH while in this state.

Another option (mentioned in the alternatives) would be to modify the behavior of the baremetalhost.metal3.io/paused annotation to set the Ironic host to maintenance mode (or even remove it from Ironic completely), but just exposing the maintenance mode flag seems a little cleaner, and this approach would prevent some desirable workflows (in particular the ability to delete a BMH in maintenance mode, skipping deprovisioning)

Comment thread design/baremetal-operator/maintenance-mode.md Outdated
Comment thread design/baremetal-operator/maintenance-mode.md Outdated
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 15, 2021

Ok so to summarize, we have three options which could satisfy the underlying requirement:

Add spec.maintenance flag

This would set the Ironic maintenance mode flag, as described in the current draft proposal

Advantages:

  • Simple to implement
  • Well understood behavior based on the underlying Ironic API

Disadvantages:

  • Node still exists in Ironic, which could be confusing, and may not be desirable for the multi-tier use-case (where in reality it's not "maintenance" but a potentially permanent change in ownership of the physical server management)
  • In the case where the parent cluster deploys $many child clusters, this solution might not scale as well as some of the others?

Modify baremetalhost.metal3.io/paused behavior

In this case we'd modify the current "paused" behavior, such that:

  • While paused, the host is removed from Ironic
  • It is possible to delete a BMH in paused state (without any deprovisioning actions)

Advantages:

  • Should be fairly simple to implement
  • Avoids adding a net-new API

Disadvantages:

  • Modifies existing API behavior (are we aware of any users of this other than the openshift installer atm?)

Add new "unmanaged" annotation

In this case we'd add a new annotation which either behaves as described above for the modifed-pause approach, or we add logic to every state which skips management operations.

Advantages:

  • Avoids changing existing API behavior

Disadvantages:

  • Potentially more complex to implement (in the "every state" case)
  • Could be confusing since it's similar in behavior to the existing "paused" annotation

Currently my preference is to go with modifying the behavior of the paused annotation - what do others think about that?

@celebdor
Copy link
Copy Markdown

Currently my preference is to go with modifying the behavior of the paused annotation - what do others think about that?

I also think that's a good way to go.

Can you expand on Add new "unmanaged" annotation's disadvantage of "Potentially more complex to implement (in the "every state" case)"

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 15, 2021

Can you expand on Add new "unmanaged" annotation's disadvantage of "Potentially more complex to implement (in the "every state" case)"

With the paused annotation approach, we just exit from the Reconcile() early

If we introduced some new annotation which didn't do that, it would likely mean we need to update many/all states in the controller to look for the new annotation and do nothing when it's set - just exiting the Reconcile early (but adding some new logic to deal with the removal from Ironic, and to enable deletion) seems like a simpler and less error-prone approach to me.

@dhellmann
Copy link
Copy Markdown
Member

+1 for this. This will be really helpful. I think also as a non-goal it should be made clear that when ironic is putting a node on maintenance mode, we won't want to sync that back to BMH resource, right or would we?
/cc @maelk

This is an interesting point. Does Ironic set this flag on hosts by itself under some conditions, @dtantsur?

@dhellmann
Copy link
Copy Markdown
Member

Modify baremetalhost.metal3.io/paused behavior

In this case we'd modify the current "paused" behavior, such that:

  • While paused, the host is removed from Ironic
  • It is possible to delete a BMH in paused state (without any deprovisioning actions)

Advantages:

  • Should be fairly simple to implement
  • Avoids adding a net-new API

Disadvantages:

  • Modifies existing API behavior (are we aware of any users of this other than the openshift installer atm?)

I believe the feature was added to make it easier for upstream to safely pivot and copy CR content from the provisioning cluster to the new cluster. I don't think the change in behavior will conflict with that use case, but we should confirm that. @kashifest, @fmuyassarov, @maelk what do you think?

Currently my preference is to go with modifying the behavior of the paused annotation - what do others think about that?

+1 for extending the behavior of the annotation we already have.

@andfasano
Copy link
Copy Markdown
Member

Currently my preference is to go with modifying the behavior of the paused annotation - what do others think about that?

👍 for that. I think that the potential ambiguity of a new - similar but slightly different - annotation is worse than modifying the existing behavior of baremetalhost.metal3.io/paused.

The only part that I guess still requires some more clarification it's about the "unpausing" step: what is the expected behavior when the annotation will be required, considering that the underlying Ironic node has been previously deleted?

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 15, 2021

The only part that I guess still requires some more clarification it's about the "unpausing" step: what is the expected behavior when the annotation will be required, considering that the underlying Ironic node has been previously deleted?

I'm thinking it's essentially the same process as recovering from a pod reschedule where the Ironic DB is lost - we'd create the Ironic host again and resume management of the host?

Provided we're confident that process is robust, I think we should be OK to delete the Ironic host, but we'll need to check there aren't unhandled corner cases.

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 16, 2021

+1 for this. This will be really helpful. I think also as a non-goal it should be made clear that when ironic is putting a node on maintenance mode, we won't want to sync that back to BMH resource, right or would we?
/cc @maelk

This is an interesting point. Does Ironic set this flag on hosts by itself under some conditions, @dtantsur?

Oh, I think I misinterpreted the question from @kashifest in my previous reply, apologies!

Yes, in some circumstances Ironic does put a host into maintenance mode - IIRC it happens when there are issues connecting to the BMC, but @dtantsur can perhaps advise if there are other conditions we should be aware of.

Regardless IMO that shouldn't be exposed directly via the BMH, it's an internal state which we might reflect via some error status or event, but not expose directly (if we added a maintenance flag, which it seems is not the likely direction for the requirement I'm trying to solve)

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 16, 2021

Ok thanks all for the feedback, it seems that extending the paused annotation is the preferred approach, so I'll rework this proposal to reflect that.

@andfasano
Copy link
Copy Markdown
Member

The only part that I guess still requires some more clarification it's about the "unpausing" step: what is the expected behavior when the annotation will be required, considering that the underlying Ironic node has been previously deleted?

I'm thinking it's essentially the same process as recovering from a pod reschedule where the Ironic DB is lost - we'd create the Ironic host again and resume management of the host?

Provided we're confident that process is robust, I think we should be OK to delete the Ironic host, but we'll need to check there aren't unhandled corner cases.

Agree, I think it could be useful to have such behavior tracked down in the proposal for clarity

Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the paused annotation seems like the tidiest user interface.
My only worry would be that it makes the implementation more complex, since the pause annotation is checked very early but we will now need to make calls to the provisioner when it is present (an obvious place to put the code for e.g. an unmanaged annotation would be in the vicinity of ensureRegistered()).
A key point about the paused annotation is that it is used in pivoting to make sure the Status subresource is not written to (to make sure that it can be regenerated from the status annotation). So that precludes us ever including in the Status any indication that the Host is in maintenance mode and that all of the data (power state &c.) is meaningless.

Comment thread design/baremetal-operator/maintenance-mode.md Outdated
Comment thread design/baremetal-operator/maintenance-mode.md Outdated
@dhellmann
Copy link
Copy Markdown
Member

Reusing the paused annotation seems like the tidiest user interface.
My only worry would be that it makes the implementation more complex, since the pause annotation is checked very early but we will now need to make calls to the provisioner when it is present (an obvious place to put the code for e.g. an unmanaged annotation would be in the vicinity of ensureRegistered()).
A key point about the paused annotation is that it is used in pivoting to make sure the Status subresource is not written to (to make sure that it can be regenerated from the status annotation). So that precludes us ever including in the Status any indication that the Host is in maintenance mode and that all of the data (power state &c.) is meaningless.

These are good points. I especially hadn't considered the fact that we couldn't write to the status fields. I am starting to think that adding a new annotation, while expanding the UI, would make the implementation easier to follow and verify. Given the complexity of managing the host state, I would err on the side of simplifying the implementation in this instance.

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 16, 2021

These are good points. I especially hadn't considered the fact that we couldn't write to the status fields

Do we need to write to the status fields? The only time I can think that might be necessary is if we encounter an unrecoverable error deleting the underlying host from Ironic (when first encountering the paused annotation), but we could generate an event without updating the host status in that case?

@dhellmann
Copy link
Copy Markdown
Member

These are good points. I especially hadn't considered the fact that we couldn't write to the status fields

Do we need to write to the status fields? The only time I can think that might be necessary is if we encounter an unrecoverable error deleting the underlying host from Ironic (when first encountering the paused annotation), but we could generate an event without updating the host status in that case?

The error handling relies on a counter in the status field to manage the retry interval backoff. We wouldn't be able to update that field. Maybe we don't need to? Talking to Ironic can result in errors, even if they are just 409s, though. If we can ensure that all of the pause handling happens outside of the state machine, we should be able to avoid the backoff logic and maybe that would be OK.

It would be useful to see if the amount of change in Reconcile() matches what @zaneb describes, too. The pause annotation happens very early. Between that point and where we have a Provisioner, we handle deletion, the status annotation, and the hardware annotation. I guess we're talking about allowing delete operations while paused? So maybe we only need to update the logic around the other annotations to skip them if we're paused. And then when we get to the point where we have a provisioner, if we're paused we would handle that instead of entering the state machine. We wouldn't be able to update the status fields on the host to record that we have handled it (by clearing the provisioning ID field), though, so every time the host is reconciled we would look at Ironic to verify that the host is not present. I guess once that's done, we don't need to keep reconciling the host, so it wouldn't be too expensive.

So, I think we could still do it with the pause annotation. It's going to complicate Reconcile(), but I don't think there's a way to do this that won't add some complexity there.

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 17, 2021

Ok this was discussed today in the community meeting, in summary:

  • @maelk pointed out that the CAPI pivot relies on us not updating the status, and as mentioned above by @dhellmann that may not be ideal in the case where there is some error deleting the host from Ironic
  • @zaneb mentioned it may be simpler to implement if we go with a new annotation
  • Some other use-cases for maintenance mode were discussed, but I think for the multi-tier use-case we're agreed that removing the host from Ironic is probably preferable as it's simpler to implement and avoids potential corner-cases around adoption.

Accordingly I'll update this proposal to describe a new annotation and attempt to capture the feedback so far, thanks!

@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 17, 2021
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 17, 2021

/retitle Add unmanaged mode proposal

@metal3-io-bot metal3-io-bot changed the title WIP: Add maintenance mode proposal Add unmanaged mode proposal Mar 17, 2021
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2021
@hardys hardys force-pushed the maintenance_mode branch 2 times, most recently from d73befc to eaab0ae Compare March 18, 2021 12:03
Comment thread design/baremetal-operator/unmanaged-annotation.md Outdated
Copy link
Copy Markdown
Member

@dtantsur dtantsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtantsur, hardys

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 23, 2021

WIP implementation ref metal3-io/baremetal-operator#827 - feedback welcome

Comment thread design/baremetal-operator/unmanaged-annotation.md Outdated
triggering deprovisioning
* Ensure it is possible to restart management of BMH resources (in the case
where they are not deleted from the parent cluster)
* Avoid changing behavior of the existing `paused` annotation since
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence wasn't completed.

include context from the system/user which applies the annotation.

This annotation will be evaluated early in the `Reconcile()` loop, but after the
`paused` `status` and `hardwaredetails` annotations are evaluated.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to do it after those other annotations are handled? Does it actually matter, or is that just what we're doing?

I think we want to do it after the hardware details and status annotations are handled to avoid triggering inspection when the unmanaged annotation is set?

It seems like want to handle it before the paused annotation, since unmanaged is a more comprehensive way to say "stop paying attention to this host" than paused is.

Would it be safer for this new annotation to only apply to hosts in a steady state? What happens if provisioning starts and this annotation is added, for example? Or deprovisioning?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely must be handled after paused and status. I think in the case of hardwaredetails that might just be where we're planning on doing it rather than a strict requirement.

The question of what happens in a non-steady state is a good one. This might be a good reason to use the Unmanaged provisioning state as we discussed, since it means the host will always return to the Registering or Provisoned states afterwards. We might want to ignore the annotation if we are in the Deprovisioning state though, because AFAIR there's no way to distinguish whether the host was Provisioned or Deprovisioning before going to Unmanaged.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the case of hardwaredetails that might just be where we're planning on doing it rather than a strict requirement

Yes it's not a strict requirement, but I was thinking there's not really any reason to prevent consuming hardwaredetails while in the unmanaged mode, thus what I proposed in metal3-io/baremetal-operator#827

The question of what happens in a non-steady state is a good one. This might be a good reason to use the Unmanaged provisioning state as we discussed, since it means the host will always return to the Registering or Provisoned states afterwards

Agreed I'd not really considered this case - previous feedback from @dhellmann was that we wouldn't want to add a new transition to the Unmanaged state, but it seems like the cleanest way to enforce that is to handle this mode via the state machine?

The other option is to only allow this annotation to function when in a specific steady state (e.g for the pivot use-case, only Provisioned would be enough)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely must be handled after paused and status. I think in the case of hardwaredetails that might just be where we're planning on doing it rather than a strict requirement.

If we handle this after the paused annotation, that means pausing has higher precedence than marking it as unmanaged. I'm not sure that makes sense.

The question of what happens in a non-steady state is a good one. This might be a good reason to use the Unmanaged provisioning state as we discussed, since it means the host will always return to the Registering or Provisoned states afterwards

Agreed I'd not really considered this case - previous feedback from @dhellmann was that we wouldn't want to add a new transition to the Unmanaged state, but it seems like the cleanest way to enforce that is to handle this mode via the state machine?

The other option is to only allow this annotation to function when in a specific steady state (e.g for the pivot use-case, only Provisioned would be enough)

Can we handle it in actionManageSteadyState in the controller?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we handle this after the paused annotation, that means pausing has higher precedence than marking it as unmanaged. I'm not sure that makes sense.

Pausing is meant to appear to stop reconciliation, so the only way it does make sense is if it has higher precedence than everything.

Can we handle it in actionManageSteadyState in the controller?

No, because it would get re-registered first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is really a requirement for pivoting as done with CAPI and CAPM3, that the pausing annotation is the first thing handled and has the highest precendence

@kashifest
Copy link
Copy Markdown
Member

@hardys if I understand correctly, this proposal then wouldn't sync BMH api with ironic's maintenance mode? I mean there wouldn't be any BMH spec field from which we would be able to set maintenance mode in ironic?

Comment thread design/baremetal-operator/unmanaged-annotation.md Outdated
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Mar 31, 2021

@hardys if I understand correctly, this proposal then wouldn't sync BMH api with ironic's maintenance mode? I mean there wouldn't be any BMH spec field from which we would be able to set maintenance mode in ironic?

@kashifest right, that was my initial proposal, but now the host is removed completely from Ironic ref metal3-io/baremetal-operator#827 - we could still add a spec.maintenance field, but based on the discussion here it doesn't sound like the best fit for the use-case I'm currently trying to solve.

@hardys hardys closed this Mar 31, 2021
@hardys hardys reopened this Mar 31, 2021
@hardys hardys force-pushed the maintenance_mode branch from dfbccf5 to f5b716f Compare April 1, 2021 16:24
@hardys hardys force-pushed the maintenance_mode branch from f5b716f to a82e5bb Compare April 1, 2021 16:25
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Apr 1, 2021

/retitle Add detached annotation proposal

@metal3-io-bot metal3-io-bot changed the title Add unmanaged mode proposal Add detached annotation proposal Apr 1, 2021
@hardys
Copy link
Copy Markdown
Member Author

hardys commented Apr 1, 2021

Updated to align with metal3-io/baremetal-operator#827


When the `detached` annotation is set, we will check the `status.provisioning.ID`
and if necessary delete the corresponding host from Ironic (without triggering
deprovisioning)
Copy link
Copy Markdown
Member

@andfasano andfasano Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could help to add that until the annotation remains then BMO will ignore any event for that resource (but the annotation removal for resuming the operations)?

@andfasano
Copy link
Copy Markdown
Member

@hardys the current version LGTM, anything else missing?

@hardys
Copy link
Copy Markdown
Member Author

hardys commented May 12, 2021

@hardys the current version LGTM, anything else missing?

@andfasano no I think this can merge now, particularly since the related BMO implementation landed some time ago ;)

@andfasano
Copy link
Copy Markdown
Member

Ok thanks

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2021
@metal3-io-bot metal3-io-bot merged commit 59d5125 into metal3-io:master May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.