Skip to content

Conversation

@openshift-cherrypick-robot

This is an automated cherry-pick of #851

/assign petr-muller

…ddress

The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
The client-go code retries a subset of network errors on GET for 30s,
but we saw occurrences of other short disruptions, like DNS ones, that
make us abort and restart unnecessarily soon.

Make CVO retry all errors for 25s and only abort when we do not succeed
in this time frame. This helps CVO survive short disruptions on startup,
leading to less noise, mostly during installation.
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Detected clone of Jira Issue OCPBUGS-1458 with correct target version. Retitling PR to link to clone:
/retitle [release-4.12] OCPBUGS-3770: Allow CVO to update KUBERNETES_SERVICE_HOST with LB address

Details

In response to this:

This is an automated cherry-pick of #851

/assign petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot changed the title [release-4.12] OCPBUGS-1458: Allow CVO to update KUBERNETES_SERVICE_HOST with LB address [release-4.12] OCPBUGS-3770: Allow CVO to update KUBERNETES_SERVICE_HOST with LB address Nov 16, 2022
@openshift-ci-robot openshift-ci-robot added the jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. label Nov 16, 2022
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-3770, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is Closed (Obsolete) instead
  • expected Jira Issue OCPBUGS-3770 to depend on a bug targeting a version in 4.13.0 and in one of the following states: MODIFIED, ON_QA, VERIFIED, but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is an automated cherry-pick of #851

/assign petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 16, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 16, 2022

@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

[release-4.12] OCPBUGS-3770: Allow CVO to update KUBERNETES_SERVICE_HOST with LB address

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@petr-muller: This pull request references Jira Issue OCPBUGS-3770, which is invalid:

  • expected Jira Issue OCPBUGS-3770 to depend on a bug targeting a version in 4.13.0 and in one of the following states: MODIFIED, ON_QA, VERIFIED, but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 16, 2022
@openshift-ci-robot
Copy link
Contributor

@petr-muller: This pull request references Jira Issue OCPBUGS-3770, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-1458 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-1458 targets the "4.13.0" version, which is one of the valid target versions: 4.13.0
  • bug has dependents

Requesting review from QA contact:
/cc @shellyyang1989

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member

petr-muller commented Nov 16, 2022

/test unit
TestCVO_UpgradeFailedPayloadLoadWithCapsChanges is a known flake that was fixed later

@petr-muller
Copy link
Member

/retest

1 similar comment
@petr-muller
Copy link
Member

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2022

@openshift-cherrypick-robot: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@shellyyang1989
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 17, 2022
@LalatenduMohanty
Copy link
Member

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Nov 17, 2022
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 17, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2022
@openshift-merge-robot openshift-merge-robot merged commit f1dc3b6 into openshift:release-4.12 Nov 17, 2022
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-3770 has been moved to the MODIFIED state.

Details

In response to this:

This is an automated cherry-pick of #851

/assign petr-muller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants