Merge spot tests to a single test#193
Merge spot tests to a single test#193openshift-merge-robot merged 2 commits intoopenshift:masterfrom
Conversation
This means that the test suite only needs to spin up a single machineset for spot instances. This is an easy alternative to having a `BeforeAll` which will hopefully come in Ginkgo V2, once it does, we can revert this and change the `BeforeEach` to a `BeforeAll`. My hope is that by reducing the number of spot instances we need to spin up, this will improve the reliability of this test on Azure which seems to have some issues bringing up spot instances.
elmiko
left a comment
There was a problem hiding this comment.
this seems reasonable to me, do we have any data yet that would demonstrate the stability?
|
/test e2e-azure-operator Retesting to check stability
I was hoping that the metric added in openshift/machine-api-operator#640 would provide this but it's not merged yet. This has come from a repeated observation that the spot test is failing and some analysis from @kwoodson who suspects that bringing up spot instances is causing much of the Azure CI flakiness |
|
Different tests failed for azure and gcp /test e2e-gcp-operator |
|
@JoelSpeed: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Last Azure test failed for CPU limit reasons, this has now been fixed! |
|
/retest |
|
/test e2e-gcp-operator |
1 similar comment
|
/test e2e-gcp-operator |
|
/retest |
|
@JoelSpeed Looks like the test failed when the autoscaler was scaling back down from 6. |
|
/retest |
|
@elmiko In the last 24 hours this test suite has run 4 times for this PR, none of the failures related to spot. On the MAO repo it ran 11 times yesterday and saw 2 failures from spot on azure, possibly not statistically significant but on the other hand, I can't really see how this change would make things worse, it reduces the number of VMs we are spinning up and therefore should make the suite faster and less prone to slow VM creation when we create many VMs at once. |
|
/retest That said, it did just fail on spot, but for a different reason, which I believe is unrelated to the issue we are trying to fix here |
|
makes sense to me @JoelSpeed , i'm good with this change |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@JoelSpeed I'm unfamiliar with this test failure. Looks like a timeout but I'm not sure why. |
|
/retest @kwoodson This test failure is really odd, but I think it's because we are running multiple disruptive tests in parallel There are a couple of tests that affect the VWC, one that deletes it, one that modifies it, both test the controller puts it back. These shouldn't be running in parallel, they will clash with one another. Will take a look at a quick fix by merging the disruptive tests to a single test. |
|
/retest |
|
/lgtm |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
12 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
This means that the test suite only needs to spin up a single machineset for spot instances.
This is an easy alternative to having a
BeforeAllwhich will hopefully come in Ginkgo V2, once it does, we can revert this and change theBeforeEachto aBeforeAll.My hope is that by reducing the number of spot instances we need to spin up, this will improve the reliability of this test on Azure which seems to have some issues bringing up spot instances.
I'd suggest reviewing this with white space changes ignored, it should make a lot more sense that way 🤞
CC @kwoodson