Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark machine failed during rollout also #733

Merged
merged 2 commits into from
Jul 20, 2022

Conversation

himanshu-kun
Copy link
Contributor

What this PR does / why we need it:
A bug where MCM was not marking machine as failed during rollout has been resolved.
In situation where the all the machines in the new or the old machineset go into Unknown state , as per the current logic the rollout would stop and won't proceed until we manually delete the machine.
This happens because machineSet controller is not aware of a Unknown state and so takes no action, and only machine controller could resolve this situation by marking the machine Failed.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

  • this erroneous behavior was introduced recently as a part of the meltdown fix PR , to avoid making meltdown logic clash with the rollout logic as they are performed by two seperate controllers and also so that we don't surpass the maxReplacement value, but after looking back , machine were always marked Failed during rollout previously also , so this PR just kind of goes back to the previous logic
    Release note:
Rollout freeze won't happen due to `Unknown` machines now.

@himanshu-kun himanshu-kun requested a review from a team as a code owner July 14, 2022 12:53
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jul 14, 2022
@himanshu-kun
Copy link
Contributor Author

/invite @kon-angelo

@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 14, 2022
@gardener-robot gardener-robot added needs/review Needs review size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Jul 14, 2022
@gardener-robot
Copy link

@kon-angelo You have pull request review open invite, please check

@himanshu-kun himanshu-kun added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 18, 2022
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 18, 2022
@kon-angelo
Copy link
Contributor

Do I get it correctly:

  • the machineSet controller can't handle machines in Unknown state.
  • the machine controller does not mark Unknown machines as failed during rollout.
  • This creates a deadlock when all machines are marked as Unknown during rollout.

@himanshu-kun
Copy link
Contributor Author

Yes that is correct

Copy link
Contributor

@kon-angelo kon-angelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Jul 18, 2022
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 18, 2022
@himanshu-kun himanshu-kun added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 20, 2022
@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 20, 2022
@himanshu-kun himanshu-kun merged commit 5d4e76d into gardener:master Jul 20, 2022
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jul 20, 2022
@himanshu-kun himanshu-kun deleted the fix-rollout-freeze branch July 20, 2022 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/s Size of pull request is small (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants