Skip to content

Conversation

@michaelgugino
Copy link

Why we need this:

This commit adds the ability to prevent processing of deletion
of machine-objects when the annotation is present. This
is particularly useful when an automated remediation mechanism
is implemented to serve as a way for administrators to
indicate they do not wish a particular machine to be
remediated for whatever reason.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 3, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign spangenberg
You can assign the PR to them by writing /assign @spangenberg in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}

func (r *ReconcileMachine) isDeleteAllowed(machine *machinev1.Machine) bool {
if r.nodeName == "" || machine.Status.NodeRef == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are removing this logic which ensures that the node on which this controller is running, will not be deleted?

Please update the PR description as well to reflect this change. Currently description sems to suggest that additionally annotation ios going to be supported.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely imagine scenarios where the MAO node would need to be deleted, however machine-healthcheck has a special case for masters that prevents it from asking for their deletion.

@vikaschoudhary16
Copy link

This PR seems to be solving the problem similar to the one being discussed here, openshift/machine-api-operator#333

@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 10, 2019
Why we need this:

This commit adds the ability to prevent processing of deletion
of machine-objects when the annotation is present.  This
is particularly useful when an automated remediation mechanism
is implemented to serve as a way for administrators to
indicate they do not wish a particular machine to be
remediated for whatever reason.
@ingvagabund
Copy link
Member

This will have an unfortunate effect of keeping a machine in Deleting state indefinitely. Right now a machine is not deleted only if the machine is also hosting the machine controller. Once such a machine is mark for deletion, it's no big deal to reschedule the controller to another node and finish the deletion. However, for a machine that is never supposed to be deleted, it will be quite confusing for users to see a machine being deleted for x hours. Someone will definitely report a bug. The same holds for a machineset that has at least one machine annotated and is supposed to be deleted. Not mentioning machine deployments that never will be able to finish rolling updated. Or, disruption budgets with max 1 machine unavailable.

Wrt. machine health checker this will help to filter out such machines so they will never get deleted.

@enxebre
Copy link
Member

enxebre commented Jul 11, 2019

this (the annotation, its creation/deletion/lifecycle and what to do/no to do when is present) should be handled by MHC/remediation/upper-level-consumer so they shape its business logic on top of immutable operations create/update/delete rather than here introducing deviated behaviour for when a machine is set for deletion leaving the object permanently with a deletionTimestamp

@bison
Copy link

bison commented Jul 11, 2019

This will have an unfortunate effect of keeping a machine in Deleting state indefinitely.

Yeah, if we want this annotation, the correct place to handle it would be in an admission webhook so we can reject the delete outright.

@michaelgugino
Copy link
Author

This will have an unfortunate effect of keeping a machine in Deleting state indefinitely. Right now a machine is not deleted only if the machine is also hosting the machine controller. Once such a machine is mark for deletion, it's no big deal to reschedule the controller to another node and finish the deletion. However, for a machine that is never supposed to be deleted, it will be quite confusing for users to see a machine being deleted for x hours. Someone will definitely report a bug. The same holds for a machineset that has at least one machine annotated and is supposed to be deleted. Not mentioning machine deployments that never will be able to finish rolling updated. Or, disruption budgets with max 1 machine unavailable.

Wrt. machine health checker this will help to filter out such machines so they will never get deleted.

@ingvagabund I think this is primarily opt-in, so there's not much reason for a bug to be filed if a user creates this annotation. Maybe machinesets need to be updated to account for this, but I don't think that's a blocker. We need to give users a way on a per-machine basis to prevent deletion. This prevention might be necessary for a variety of reasons such as preserving an instance for security audits, etc.

@bison I think this annotation could definitely be used in tandem with a validating webhook to avoid persisting in delete forever (which we should have an alarm on, as I suggested previously).

As far as the remediation layer, I don't want to rely on a 3rd party respecting this annotation, because what if it doesn't? The user should always be right and the buck stops with the machine-controller.

@openshift-ci-robot
Copy link

@michaelgugino: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/goimports 920a788 link /test goimports

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

/bugzilla refresh

@openshift-ci-robot
Copy link

@openshift-bot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link

/bugzilla refresh

@openshift-ci-robot
Copy link

@openshift-bot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eparis
Copy link
Member

eparis commented Oct 26, 2019

Is this still a PR we want to pursue, or should it just be closed after all this time?

@enxebre
Copy link
Member

enxebre commented Oct 28, 2019

please let's elaborate an enhancement PR and reopen against master if still relevant

@enxebre enxebre closed this Oct 28, 2019
@michaelgugino
Copy link
Author

FYI: generic upstream discussion: kubernetes-sigs#1514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants