Skip to content

Conversation

@jottofar
Copy link
Contributor

@jottofar jottofar commented May 12, 2021

Mostly followed kube defaulting logic however with some exceptions:

  • Currently only CVO deployment uses "KUBERNETES_SERVICE_HOST" env variable to inject internal LB host.This may result in an IP address being returned by API so always assuming returned value is okay.
  • Toleration keys can be duplicated and are so in CVO (see Bug 1941901) so needed to change logic to not simply exit when first matched key found possibly incorrectly overwrite the wrong key's value.
    Also added VolumeSource defaulting in its entirety even though some are not currently used to avoid future hot looping issues.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 12, 2021
@openshift-ci openshift-ci bot requested review from vrutkovs and wking May 12, 2021 22:06
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2021
@jottofar jottofar closed this May 13, 2021
@jottofar
Copy link
Contributor Author

/reopen

@openshift-ci openshift-ci bot reopened this May 13, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 13, 2021

@jottofar: Reopened this PR.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jottofar jottofar force-pushed the bug-1881484 branch 4 times, most recently from 72e5bd4 to 87e6eec Compare May 13, 2021 21:23
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2021
@jottofar jottofar force-pushed the bug-1881484 branch 13 times, most recently from ec4c91c to b5b4711 Compare May 27, 2021 16:53
@jottofar
Copy link
Contributor Author

/test e2e-agnostic-operator

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 2, 2021
@jottofar
Copy link
Contributor Author

jottofar commented Jun 2, 2021

/retest

@vrutkovs
Copy link

vrutkovs commented Jun 2, 2021

Verified that its no longer happening via cluster-bot test e2e openshift/cluster-version-operator#561,openshift/cluster-version-operator#559 task:

$ curl -sL https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1400083116516708352/artifacts/launch/pods/openshift-cluster-version_cluster-version-operator-54b7b7b8fb-frm7n_cluster-version-operator.log | grep -oP "Updating \K.+ due to diff" | cut -d' ' -f1 | sort | uniq -c
     10 ClusterServiceVersion
     10 PriorityClass

@jottofar
Copy link
Contributor Author

jottofar commented Jun 2, 2021

/test unit

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 2, 2021
@jottofar
Copy link
Contributor Author

jottofar commented Jun 2, 2021

/retest

Copy link

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 2, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2021
@jottofar
Copy link
Contributor Author

jottofar commented Jun 2, 2021

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member

wking commented Jun 3, 2021

API-server TargetDown is unrelated to this PR.

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 3, 2021

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

API-server TargetDown is unrelated to this PR.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 8ee1ff4 into openshift:master Jun 3, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 3, 2021

@jottofar: All pull requests linked via external trackers have merged:

Bugzilla bug 1881484 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1881484: Set defaults in deployment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 13, 2022
This just adds the reproducer for the behavior that (I think) is the
culprit of OCPBUGS-1458. The problem was identified to be a broken
substitution of internal load balancer into `KUBERNETES_SERVICE_HOST` by
Trevor and David (see my [JIRA
comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack
thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the [`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in [`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the  `EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way:

https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcemerge/core.go#L115-L120

https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcemerge/core.go#L132-L138

This behavior was added intentionally in openshift#559, but I do not understand the reasoning given in the comment:

> Currently only CVO deployment uses "KUBERNETES_SERVICE_HOST" env variable to inject internal LB host.This may result in an IP address being returned by API so always assuming returned value is okay.
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 21, 2022
This just adds the reproducer for the behavior that (I think) is the
culprit of OCPBUGS-1458. The problem was identified to be a broken
substitution of internal load balancer into `KUBERNETES_SERVICE_HOST` by
Trevor and David (see my [JIRA
comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack
thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the [`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in [`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the  `EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way:

https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcemerge/core.go#L115-L120

https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcemerge/core.go#L132-L138

This behavior was added intentionally in openshift#559, but I do not understand the reasoning given in the comment:

> Currently only CVO deployment uses "KUBERNETES_SERVICE_HOST" env variable to inject internal LB host.This may result in an IP address being returned by API so always assuming returned value is okay.
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 24, 2022
The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 24, 2022
The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 25, 2022
…ddress

The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 26, 2022
…ddress

The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request Oct 31, 2022
…ddress

The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Nov 16, 2022
…ddress

The problem was identified to be a broken substitution of internal load
balancer into `KUBERNETES_SERVICE_HOST` by Trevor and David (see my [JIRA comment](https://issues.redhat.com/browse/OCPBUGS-1458?focusedCommentId=21090756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21090756)
and related [Slack thread](https://coreos.slack.com/archives/C011CSSPBLK/p1664925995946479?thread_ts=1661182025.992649&cid=C011CSSPBLK)).

CVO injects the LB hostname in the
[`ModifyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourcebuilder/apps.go#L19)
fine, but then the deployment gets applied in
[`ApplyDeployment`](https://github.com/openshift/cluster-version-operator/blob/dc1ad0aef5f3e1b88074448d21445a5bddb6b05b/lib/resourceapply/apps.go#L17)
and the
`EnsureDeployment`->`ensurePodTemplateSpec`->`ensurePodSpec`->`ensureContainers`->`ensureContainer`->`ensureEnvVar`
chain stomps the updated value in `required` by the old value from
`existing` and reverts the injection in this way

This behavior was added intentionally in openshift#559
as a part of a fix for various hot-looping issues. The substitution
apparently caused some hot-looping issues in the past ([slack thread](https://coreos.slack.com/archives/CEGKQ43CP/p1620934857402200?thread_ts=1620895567.367100&cid=CEGKQ43CP)).
I have tested removing the special handling `KUBERNETES_SERVICE_HOST`
thoroughly, and saw no problematic behavior. After fixing other
hot-looping problems in openshift#855
to eliminate noise, no new hot-loops occurs with
`KUBERNETES_SERVICE_HOST` handling removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants