Fail fast if machines enter Failed phase#182
Fail fast if machines enter Failed phase#182Danil-Grigorev wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
pkg/framework/machines.go
Outdated
| } | ||
| // Fail fast if machine entered Failed phase | ||
| if machine.Status.Phase != nil { | ||
| Expect(*machine.Status.Phase).NotTo(Equal(MachinePhaseFailed), |
There was a problem hiding this comment.
would this break the eventually loop? we still want to WaitForMachinesDeleted
There was a problem hiding this comment.
The failed machine will never be deleted, as it is not going to reconcile. It should break the loop and report an error, unlike returning an error inside the block.
There was a problem hiding this comment.
A failed machine still gets reconciled by the deletion logic does it not? The machine failed check is after the deletion timestamp check. As far as I can tell a failed machine can still be deleted, it will likely just be quicker.
|
/retest |
pkg/framework/machines.go
Outdated
| } | ||
| // Fail fast if machine entered Failed phase | ||
| if machine.Status.Phase != nil { | ||
| Expect(*machine.Status.Phase).NotTo(Equal(MachinePhaseFailed), |
There was a problem hiding this comment.
A failed machine still gets reconciled by the deletion logic does it not? The machine failed check is after the deletion timestamp check. As far as I can tell a failed machine can still be deleted, it will likely just be quicker.
| if m.Status.Phase != nil { | ||
| switch *m.Status.Phase { | ||
| case MachinePhaseRunning: | ||
| result = append(result, machines[i]) | ||
| case MachinePhaseFailed: | ||
| return nil, fmt.Errorf("Machine entered the Failed phase: %q, reason: %v", m.GetName(), m.Status.ErrorMessage) | ||
| } |
There was a problem hiding this comment.
One question that comes to mind is whether any of the tests rely on a Machine going into a failed phase, we need to make sure we aren't breaking those by doing this
There was a problem hiding this comment.
We are not testing that at the moment, so it is not a breaking change. Why wait for machines running if you are expecting them not to?
There was a problem hiding this comment.
Good question 😅 I just wondered if we were using this to check that a machine didn't become running. I think it's ok
According to MAO specifics, machines entering Failed phase will no longer be provisioned or processed. This commit ensures that e2e tests requiring machines to enter Running phase, or waiting for a full MachineSet rollout, would fail faster and give more readable failure output, instead of a generic timeout on condition.
8a2a8fa to
fe88e5d
Compare
JoelSpeed
left a comment
There was a problem hiding this comment.
I'm not entirely sure whether calling Expect inside an Eventually is good practice, have you considered changing the Eventually blocks to the poll style that we have used in other wait for function? Then you have three states you can return, success, not yet and a terminal failure, which is what we want
| if m.Status.Phase != nil { | ||
| switch *m.Status.Phase { | ||
| case MachinePhaseRunning: | ||
| result = append(result, machines[i]) | ||
| case MachinePhaseFailed: | ||
| return nil, fmt.Errorf("Machine entered the Failed phase: %q, reason: %v", m.GetName(), m.Status.ErrorMessage) | ||
| } |
There was a problem hiding this comment.
Good question 😅 I just wondered if we were using this to check that a machine didn't become running. I think it's ok
|
/retest |
|
/retest |
1 similar comment
|
/retest |
|
@Danil-Grigorev: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
@Danil-Grigorev: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
According to MAO specifics, machines entering
Failedphase will no longer be provisioned or processed. This commit ensures thate2etests requiring machines to enterRunningphase, or waiting for a fullMachineSetrollout, would fail faster and give more readable failure output, instead of a generic timeout on condition.This will help improve CI readability.