-
Notifications
You must be signed in to change notification settings - Fork 2.1k
ci-operator/templates/openshift: Refactor router-rollout wait (again) #2321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1771de4 to
a189182
Compare
|
/sigh Maybe we should have a nice Go binary that handles these sorts of things instead of cobbling together bash? :) |
|
Saw this again here. @michaelgugino, @mtnbikenc, @sdodson, @vrutkovs, can you take a look? @abhinavdahiya, @crawford, I can also split off the installer change into a separate PR if we don't want to wait for the Andible folks. Thoughts? |
crawford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@wking is there any advantage to getting our's in earlier? If not, let's just wait. |
|
The advantage is that repos blocking CI aren't running the Ansible tests, and fixing this for AWS makes it more likely that the PRs we need to unblock CI can squeak through. |
|
Ah, okay. Maybe split this. I haven't heard back from the Ansible team. |
|
/cc @vrutkovs |
a189182 to
4bbde49
Compare
Catch up with ac206e7 (ci-operator/templates/openshift: Refactor router-rollout wait (again), 2018-11-05, openshift#2342) and ff16a01 (ci-operator/templates/openshift: Refactor router-rollout 'oc oc', 2018-12-09, openshift#2343).
4bbde49 to
7cc59bc
Compare
|
/lgtm Why would we not wait for ingress cluster operator to stop progressing instead? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, vrutkovs, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@wking: Updated the
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Tradition ;). We're goiint to replace all these hacks with a cluster-version waiter once we have that working. |
Today I saw:
I suspect that the
rollout statusrequest took long enough that the freshdatecall generated a time larger thanwait_expiry_time. This commit rerolls the logic last touched by 7991fd3 (#2004), with an implementation based on one of my suggestions there. And, full disclosure, the buggy implementation from #2004 is also based on one of my suggestions, so don't assume I know what I'm talking about ;).Now we pick a total wait time (10 minutes), regardless of how many times we need to reconnect the watcher. With this commit, each watcher will try to wait for the full remaining period. So the first watcher tries to wait for 10 minutes. And if the first times out after 2 minutes, the second watcher will try to wait for 8 minutes.
And the cool-off sleep is no longer parameterized, which removes the change of flaking like I saw today.