Skip to content

Bug 1931505: [on-prem] Cleanup keepalived vips before starting service#2511

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
cybertron:keepalived-vip-cleanup
Apr 15, 2021
Merged

Bug 1931505: [on-prem] Cleanup keepalived vips before starting service#2511
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
cybertron:keepalived-vip-cleanup

Conversation

@cybertron
Copy link
Member

Due to a bug in keepalived 2.0.10, if the liveness probe kills
keepalived, any vips that were assigned to the system remain and
are not cleaned up when keepalived restarts. Until we get a fixed
version of keepalived, let's clean up the VIPs ourselves before
starting keepalived. If the node is supposed to have the VIP it will
configure it again when keepalived starts.

This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1931505

- Description for the changelog
Under some circumstances, a VIP may be configured on a node it shouldn't be due to keepalived exiting uncleanly. To avoid this, remove any existing VIPs before starting keepalived.

@cybertron
Copy link
Member Author

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere
/cc @mandre @Gal-Zaidman @jcpowermac @patrickdillon

@kikisdeliveryservice kikisdeliveryservice changed the title Cleanup keepalived vips before starting service [on-prem] Cleanup keepalived vips before starting service Apr 2, 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you specify also the netmask here?

I checked it in my environment and I got [1] warning

[1]

  • remove_vip 192.168.111.5
  • address=192.168.111.5
    ++ ip -o a
    ++ grep 192.168.111.5
    ++ awk '{print $2}'
  • interface=enp2s0
  • '[' -n enp2s0 ']'
  • ip a del 192.168.111.5 dev enp2s0
    Warning: Executing wildcard deletion to stay compatible with old scripts.
    Explicitly specify the prefix length (192.168.111.5/32) to avoid this warning.
    This special behaviour is likely to disappear in further releases,
    fix your scripts!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I had mistakenly hard-coded the mask in my first version of this patch, and didn't notice that it triggered the warning after I removed it. The latest version should handle it correctly.

@cybertron
Copy link
Member Author

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere

I really wish Prow didn't forget about the explicitly run jobs every time you update a PR.

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/retest
/lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intend to someday remove the workaround then we should reference the upstream keepalived issue. I personally don't see a problem keeping the removal of the VIP in the startup script forever, so up to you if you want to modify the patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of not throwing away the ci passes I have I think I'll leave it alone for now. I can push a small followup to add a reference to the bz that can be merged whenever. If that doesn't make 4.8 it won't be a big deal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cybertron Maybe we should add that now, since it had to be rebased in the last 15 minutes anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point. Thanks for the reminder.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2021
@yboaron
Copy link
Contributor

yboaron commented Apr 6, 2021

/lgtm

@yboaron
Copy link
Contributor

yboaron commented Apr 6, 2021

Since this PR suppose to fix [1] BZ, maybe it worth link it to BZ?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1931505 ?

@Gal-Zaidman
Copy link

/lgtm

@cybertron
Copy link
Member Author

Since this PR suppose to fix [1] BZ, maybe it worth link it to BZ?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1931505 ?

I intentionally didn't do this because it doesn't fix the underlying bug, but maybe we could open a separate one against keepalived itself?

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Apr 6, 2021

Since this PR suppose to fix [1] BZ, maybe it worth link it to BZ?
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1931505 ?

I intentionally didn't do this because it doesn't fix the underlying bug, but maybe we could open a separate one against keepalived itself?

Adding approval and a hold, given the above ^^ @cybertron if you want to add a BZ, please do so and remove the hold when you are ready.

/approve
/hold

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 6, 2021
@cybertron
Copy link
Member Author

Adding approval and a hold, given the above ^^ @cybertron if you want to add a BZ, please do so and remove the hold when you are ready.

Thanks, I think I will. It's possible we will need to backport this too since the bug has existed for a while. I'm not sure why it only became a problem in 4.8, but it could easily pop up in earlier releases too.

@cybertron cybertron changed the title [on-prem] Cleanup keepalived vips before starting service Bug 1931505: [on-prem] Cleanup keepalived vips before starting service Apr 6, 2021
@openshift-ci-robot
Copy link
Contributor

@cybertron: This pull request references Bugzilla bug 1931505, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (vvoronko@redhat.com), skipping review request.

Details

In response to this:

Bug 1931505: [on-prem] Cleanup keepalived vips before starting service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 6, 2021
@cybertron
Copy link
Member Author

Okay, I opened a new bug against keepalived for this and added the existing one to this PR. Once we get ci passes this should be good to go.

/test e2e-vsphere
/test e2e-vsphere-upgrade

@mandre
Copy link
Member

mandre commented Apr 7, 2021

/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, Gal-Zaidman, jcpowermac, kikisdeliveryservice, mandre, yboaron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cybertron
Copy link
Member Author

/test e2e-openstack
/test e2e-ovirt

@cybertron
Copy link
Member Author

/test e2e-openstack

1 similar comment
@cybertron
Copy link
Member Author

/test e2e-openstack

@stbenjam
Copy link
Member

Can we remove the hold? Looks like the relevant things are passing \o/

@cybertron
Copy link
Member Author

/hold cancel

Yep, I think we should be good to go now.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2021
@jstuever
Copy link
Contributor

/cc

@kikisdeliveryservice
Copy link
Contributor

/retest

@cybertron
Copy link
Member Author

/retest

One down...

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@kikisdeliveryservice
Copy link
Contributor

vsphere-upgrade and okd tests are not expected to pass
/skip

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@stbenjam
Copy link
Member

/test e2e-agnostic-upgrade

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 15, 2021

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack a156dc24861cd862134cdcd2949dae6e3171e239 link /test e2e-metal-ipi-ovn-dualstack
ci/prow/okd-e2e-aws d31e50b link /test okd-e2e-aws
ci/prow/e2e-vsphere-upgrade d31e50b link /test e2e-vsphere-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cybertron
Copy link
Member Author

/retest

@stbenjam
Copy link
Member

/refresh

1 similar comment
@stbenjam
Copy link
Member

/refresh

@openshift-ci-robot
Copy link
Contributor

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-vsphere-upgrade d31e50b link /test e2e-vsphere-upgrade
ci/prow/okd-e2e-aws d31e50b link /test okd-e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@stbenjam
Copy link
Member

/skip

@openshift-merge-robot openshift-merge-robot merged commit aa5d37e into openshift:master Apr 15, 2021
@openshift-ci-robot
Copy link
Contributor

@cybertron: All pull requests linked via external trackers have merged:

Bugzilla bug 1931505 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1931505: [on-prem] Cleanup keepalived vips before starting service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.