Skip to content

Conversation

@ricky-rav
Copy link
Contributor

@ricky-rav ricky-rav commented Feb 8, 2022

Add label node.kubernetes.io/exclude-from-external-load-balancers on nodes that are going for a reboot and remove it at startup. This label is used by the cloud service controller in order to know which nodes to exclude from backend pools of cloud load balancers.

By adding this label at very start of the reboot process, instead of just waiting for the node to become NotReady, we give the cloud service controller more time to update backend pools on cloud load balancers.

This is part of the solution for BZ #2040715

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 8, 2022
@ricky-rav ricky-rav force-pushed the dev_bz_2040715a branch 3 times, most recently from c19ecd5 to c07f323 Compare February 9, 2022 16:20
@ricky-rav ricky-rav force-pushed the dev_bz_2040715a branch 2 times, most recently from dee07c1 to e6446d1 Compare February 9, 2022 16:39
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on a first pass

@ricky-rav ricky-rav force-pushed the dev_bz_2040715a branch 3 times, most recently from 59adb63 to 03bb0cb Compare February 16, 2022 10:19
@ricky-rav ricky-rav changed the title [WIP][DNM] testing if azure LB correctly updates NotReady nodes before these nodes power off [WIP][DNM] set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup Feb 16, 2022
@ricky-rav ricky-rav force-pushed the dev_bz_2040715a branch 3 times, most recently from 0346ed7 to a659e97 Compare February 16, 2022 10:49
@ricky-rav ricky-rav changed the title [WIP][DNM] set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup Feb 16, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 16, 2022
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on deads2k comments, but also there are 3 funcs seemingly called the same/similar things, can we condense down to 1-2 for clarity and also add in text description of funcs?

@kikisdeliveryservice kikisdeliveryservice changed the title set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup Bug 2040715: set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup Feb 16, 2022
@openshift-ci openshift-ci bot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label Feb 16, 2022
@kikisdeliveryservice
Copy link
Contributor

Latest change fixed yesterday's problem. Also noting that vsphere-upgrade did in fact pass (tho not the reqd agnostic upgrade), so this is probably ok with some retesting.

/retest-required

@kikisdeliveryservice
Copy link
Contributor

Just to followup, is this PR still needed?

@squeed
Copy link
Contributor

squeed commented May 17, 2022

@kikisdeliveryservice @ricky-rav This PR would certainly be welcome; I think it would be a significant improvement in reliability.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2022
@kikisdeliveryservice
Copy link
Contributor

/test unit

@ricky-rav ricky-rav changed the title Bug 2040715: set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup Set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup Jul 26, 2022
@openshift-ci openshift-ci bot removed bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jul 26, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2022

@ricky-rav: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

Set "node.kubernetes.io/exclude-from-external-load-balancers" label at reboot and unset it at startup

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2023
@ricky-rav ricky-rav closed this Jan 18, 2023
@ricky-rav ricky-rav reopened this Feb 13, 2023
@ricky-rav
Copy link
Contributor Author

/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2023

@ricky-rav: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test 4.12-upgrade-from-stable-4.11-images
  • /test cluster-bootimages
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-upgrade
  • /test e2e-gcp-op
  • /test images
  • /test okd-scos-images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
  • /test bootstrap-unit
  • /test e2e-alibabacloud-ovn
  • /test e2e-aws-disruptive
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-fips-op
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-proxy
  • /test e2e-aws-serial
  • /test e2e-aws-single-node
  • /test e2e-aws-upgrade-single-node
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-ovn-upgrade
  • /test e2e-azure-upgrade
  • /test e2e-gcp-op-single-node
  • /test e2e-gcp-rt
  • /test e2e-gcp-rt-op
  • /test e2e-gcp-single-node
  • /test e2e-gcp-upgrade
  • /test e2e-hypershift
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-parallel
  • /test e2e-ovirt
  • /test e2e-ovirt-upgrade
  • /test e2e-ovn-step-registry
  • /test e2e-vsphere
  • /test e2e-vsphere-upgrade
  • /test e2e-vsphere-upi
  • /test okd-e2e-aws
  • /test okd-e2e-gcp-op
  • /test okd-e2e-upgrade
  • /test okd-e2e-vsphere
  • /test okd-images
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-e2e-gcp-op
  • /test okd-scos-e2e-gcp-ovn-upgrade
  • /test okd-scos-e2e-vsphere

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-machine-config-operator-master-e2e-alibabacloud-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-hypershift
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-okd-scos-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kikisdeliveryservice, ricky-rav
Once this PR has been reviewed and has the lgtm label, please ask for approval from cheesesashimi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kikisdeliveryservice, ricky-rav
Once this PR has been reviewed and has the lgtm label, please ask for approval from cheesesashimi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…it at startup

Add label "node.kubernetes.io/exclude-from-external-load-balancers" on nodes that are going for a reboot and remove it at startup. This label is used by the cloud service controller in order to know which nodes to exclude from backend pools of cloud load balancers. By setting this label at very start of the reboot process, instead of waiting for the node to become NotReady, we give the cloud service controller more time to update backend pools on cloud load balancers.

Signed-off-by: Riccardo Ravaioli <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2023

@ricky-rav: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi 0392a52c13170c4a90bd691d9de44055b84868ab link false /test e2e-metal-ipi
ci/prow/e2e-aws-disruptive 0392a52c13170c4a90bd691d9de44055b84868ab link false /test e2e-aws-disruptive
ci/prow/e2e-aws-single-node 0392a52c13170c4a90bd691d9de44055b84868ab link false /test e2e-aws-single-node
ci/prow/e2e-aws-workers-rhel7 0392a52c13170c4a90bd691d9de44055b84868ab link false /test e2e-aws-workers-rhel7
ci/prow/e2e-aws-upgrade-single-node 0392a52c13170c4a90bd691d9de44055b84868ab link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-workers-rhel8 0392a52c13170c4a90bd691d9de44055b84868ab link false /test e2e-aws-workers-rhel8
ci/prow/4.12-upgrade-from-stable-4.11-images 0392a52c13170c4a90bd691d9de44055b84868ab link true /test 4.12-upgrade-from-stable-4.11-images
ci/prow/e2e-agnostic-upgrade b931d0e17e98698775f8936bd207d124e84f6036 link true /test e2e-agnostic-upgrade
ci/prow/okd-scos-e2e-gcp-ovn-upgrade d8787e0 link false /test okd-scos-e2e-gcp-ovn-upgrade
ci/prow/e2e-alibabacloud-ovn d8787e0 link false /test e2e-alibabacloud-ovn
ci/prow/e2e-hypershift d8787e0 link false /test e2e-hypershift
ci/prow/unit d8787e0 link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 16, 2023
@ricky-rav ricky-rav closed this Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants