Skip to content

Bug 1917484: Don't adopt after clean failure during deprovisioning#121

Closed
zaneb wants to merge 1 commit intoopenshift:masterfrom
zaneb:openshift-4.7/deprov-failure-adopt
Closed

Bug 1917484: Don't adopt after clean failure during deprovisioning#121
zaneb wants to merge 1 commit intoopenshift:masterfrom
zaneb:openshift-4.7/deprov-failure-adopt

Conversation

@zaneb
Copy link
Copy Markdown
Member

@zaneb zaneb commented Jan 27, 2021

During deprovisioning of a Host, if 'deleting' (i.e. deprovisioning) the
node succeeds (i.e. it doesn't go to the Error state) but the automated
cleaning that follows fails, the only way to recover is to return the
node to the manageable state.

Previously, once in the manageable state we would attempt adoption on
the node so that we could deprovision again. However, in the course of
'deleting' the node, the image information is cleared from it so it
cannot be adopted again. (Adoption continues to be the right thing to do
if the node has just been re-registered due to the Ironic database being
recreated, and in that case the image information is present since it
gets added during the initial registration.)

To work around this, don't attempt to adopt during the Deprovisioning
state if the node is manageable and the image data is not present.
Handle the manageable state in Deprovision() by declaring the
deprovisioning complete.

A node in the manageable state cannot be re-provisioned without first
being cleaned - it must go through cleaning to reach the available state
before it can be provisioned. Provisioning already handles nodes in the
manageable state, as this is how they begin after the initial inspection
of the host before the first provisioning (which does the initial
cleaning).

Backport of metal3-io#772

During deprovisioning of a Host, if 'deleting' (i.e. deprovisioning) the
node succeeds (i.e. it doesn't go to the Error state) but the automated
cleaning that follows fails, the only way to recover is to return the
node to the manageable state.

Previously, once in the manageable state we would attempt adoption on
the node so that we could deprovision again. However, in the course of
'deleting' the node, the image information is cleared from it so it
cannot be adopted again. (Adoption continues to be the right thing to do
if the node has just been re-registered due to the Ironic database being
recreated, and in that case the image information is present since it
gets added during the initial registration.)

To work around this, don't attempt to adopt during the Deprovisioning
state if the node is manageable and the image data is not present.
Handle the manageable state in Deprovision() by declaring the
deprovisioning complete.

A node in the manageable state cannot be re-provisioned without first
being cleaned - it must go through cleaning to reach the available state
before it can be provisioned. Provisioning already handles nodes in the
manageable state, as this is how they begin after the initial inspection
of the host before the first provisioning (which does the initial
cleaning).

(cherry picked from commit ba38688)
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Jan 27, 2021
@openshift-ci-robot
Copy link
Copy Markdown

@zaneb: This pull request references Bugzilla bug 1917484, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1917484: Don't adopt after clean failure during deprovisioning

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 27, 2021
@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2021
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jan 27, 2021

@zaneb: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/unit 3d51922 link /test unit
ci/prow/e2e-metal-ipi-ovn-ipv6 3d51922 link /test e2e-metal-ipi-ovn-ipv6
ci/prow/images 3d51922 link /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@honza
Copy link
Copy Markdown
Member

honza commented Feb 1, 2021

This seems to depend on metal3-io@0e1acfe which is part of metal3-io#761

@honza
Copy link
Copy Markdown
Member

honza commented Feb 1, 2021

Closing in favour of #122
/close

@openshift-ci-robot
Copy link
Copy Markdown

@honza: Closed this PR.

Details

In response to this:

Closing in favour of #122
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Copy Markdown

@zaneb: This pull request references Bugzilla bug 1917484. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1917484: Don't adopt after clean failure during deprovisioning

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants