Skip to content

Conversation

@sgreene570
Copy link
Contributor

@sgreene570 sgreene570 commented Aug 5, 2020

Waiting for the router factory/controller to call the router plugin's Commit function guarantees that route/endpoint resources are available to the template code, and thus removes the possibility of a "routeless" router. A "routeless" router can be detrimental during upgrades if the routes carrying over during the upgrade break a newer version HAProxy, since the "routeless" configuration will continue to run if future successive reloads fail, and the upgrade will succeed when it should not.
The initial (premature) call to the router's commitAndReload() function outside of the rate limited loop is older code (3.11 era) and should be removed.

This PR also adds a failed reload count metric for tracking failed reloads that happen well after a new router pod is created. This will be used for cluster alerting.

See the linked BZ for more context.

/assign @Miciah
/cc @frobware @danehans

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 5, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2020
@sgreene570 sgreene570 force-pushed the router-remove-initial-sync branch 2 times, most recently from 7aafc41 to 6ca2295 Compare August 6, 2020 14:39
@sgreene570 sgreene570 changed the title [WIP]: Remove initial haproxy template commitAndReload Bug 1861455: Remove initial haproxy template commitAndReload Aug 6, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 6, 2020
@openshift-ci-robot
Copy link
Contributor

@sgreene570: This pull request references Bugzilla bug 1861455, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1861455: Remove initial haproxy template commitAndReload

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Aug 6, 2020
@sgreene570 sgreene570 force-pushed the router-remove-initial-sync branch from 6ca2295 to 5c8ec99 Compare August 6, 2020 17:46
If a router reload fails after a router pod becomes
ready, we need a way to alert cluster admins that
newly created route resources are not being applied
to the cluster.
The first call to `commitAndReload` bypasses the rate limited
reload logic that also takes into account route sync status.
Removing this initial router reload call will prevent the router
from starting in a "routeless" state: that is, a state were
the router is running before it has begun watching route
resources.
@sgreene570 sgreene570 force-pushed the router-remove-initial-sync branch from 5c8ec99 to becfcb0 Compare August 6, 2020 17:52
@Miciah
Copy link
Contributor

Miciah commented Aug 6, 2020

Thanks! This looks terrific!
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 6, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 8d0d6a4 into openshift:master Aug 7, 2020
@openshift-ci-robot
Copy link
Contributor

@sgreene570: All pull requests linked via external trackers have merged: openshift/router#165. Bugzilla bug 1861455 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1861455: Remove initial haproxy template commitAndReload

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sgreene570
Copy link
Contributor Author

/cherry-pick release-4.5

@openshift-cherrypick-robot

@sgreene570: #165 failed to apply on top of branch "release-4.5":

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants