-
Notifications
You must be signed in to change notification settings - Fork 462
tests: Stop bumping worker maxUnavailable to 3 #1238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Stop bumping worker maxUnavailable to 3 #1238
Conversation
IIRC we did this just to speed up these tests because updating workers 1 by 1 blew out our hour budget. The router requires a minimum of two workers though, and we're just going to be fighting its PDB. Since customers can't sanely do this, let's stop doing it in our tests. If our tests take too long...we'll have to either cut down the tests or make them a periodic, etc.
|
even if we do this out tests will start failing as we're right on the edge of timing out all the time as is... |
|
let's see how this pans out in ci runs.. before LGTM. |
|
But if we do support this, should we really be changing the tests rather than letting network edge make accommodations? |
|
Right, I noted that in this part of the commit message:
Another option is to make them optional, e.g. we could fairly easily move FIPS to a separate I'm running the current e2e tests (without this commit) on a sacrifical 4.3/GCP cluster to see if I can figure out what's going on with those |
This might make a lot of sense.. |
|
/retest |
|
Seriously GCPPPPP?! /test e2e-gcp-op |
|
So on the |
|
/skip |
|
Job hasn't officially finished but seeing in logs: So this isn't going to fix it for us... and since this failure is also being seen in e2e-aws, maybe the fix needs to be elsewhere... |
|
Will wait on final logs to verify.. |
|
/test e2e-gcp-op |
|
Let's see if this passes today... and if we can make some modifications to get our e2e going.. |
:( /retest |
|
@cgwalters any ideas on how to proceed with this (of course ci is currently busted with a different problem entirely afaik...) |
|
looking at the failed run from Friday, the test ran for 5768.476s = 96.14 min < 120 min timeout? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ashcrow, cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Nope, not sure what's going on yet. Only had about 10% bandwidth today for this. Will keep looking at it though. |
|
@cgwalters @ashcrow I'm noticing a bunch of test time discrepancies see my pr: I'm going to do some other investigation and will report back also lodged some questions in test-platform. |
IIRC we did this just to speed up these tests because updating
workers 1 by 1 blew out our hour budget.
The router requires a minimum of two workers though, and we're just
going to be fighting its PDB.
Since customers can't sanely do this, let's stop doing it in our
tests. If our tests take too long...we'll have to either cut
down the tests or make them a periodic, etc.