Skip to content

Conversation

@andfasano
Copy link
Contributor

This PR adds the manifests required by CVO for the deployment of CBO, as specified in the related CVO documentation

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 29, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 29, 2020
@andfasano andfasano changed the title [WIP] Allow CVO to deploy CBO Allow CVO to deploy CBO Sep 30, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2020
Copy link
Contributor

@sadasu sadasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a generic question inline. Other than that this looks good.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2020
@@ -0,0 +1,8 @@
apiVersion: config.openshift.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably come before the deployment, since the deployment uses it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it matters. Shouldn't CBO create the CO deployement if not found anyways> FYI, MAO uses the same runlevel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the note in this section of the docs I think the ClusterOperator resource has to come after the Deployment. The CVO doesn't use the manifest to actually create the resource, it only uses it to know which CO resources it should wait for before continuing with the next step of the install/upgrade. So, I think this is correct.

- name: cluster-baremetal-operator
env:
- name: RELEASE_VERSION
value: "0.0.1-snapshot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something in the build process change this value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure. But this value is used in the CO to signal upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section of the docs includes an example with a comment that the value "is substituted in the manifests when the payload is built" so I guess something about building the release image modifies the value.

@dhellmann
Copy link
Contributor

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, dhellmann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [andfasano,dhellmann]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kirankt
Copy link
Contributor

kirankt commented Oct 6, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 6, 2020
@openshift-merge-robot openshift-merge-robot merged commit c2d3bc8 into openshift:master Oct 6, 2020
@andfasano andfasano deleted the cvo-deploy branch October 7, 2020 05:43
wking added a commit to wking/cluster-baremetal-operator that referenced this pull request Dec 21, 2023
…ader election

The option has been available for years:

  $ git blame main.go | grep enable-leader-election
  dcbe86f (Sandhya Dasu               2020-08-18 21:09:29 -0400  72)    flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,

and without it overlapping operator pods can crash-loop [1]:

  : [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers	0s
  {  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 26 times
  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 51 times}

while fighting each other over the same ClusterOperator status:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1737335551998038016/artifacts/e2e-aws-ovn-upgrade/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2
  $ zgrep -h '"resource":"clusteroperators","name":"baremetal"' kube-apiserver/*audit*log.gz | jq -r 'select(.verb == "create" or .verb == "update") | .stageTimestamp + " " + .verb + " " + (.responseStatus.code | tostring) + " " + (.objectRef.subresource) + " " + .user.username + " " + .user.extra["authentication.kubernetes.io/pod-name"][0]' | grep 'T06:08:.*cluster-baremetal-operator' | sort
  2023-12-20T06:08:21.757799Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4
  2023-12-20T06:08:21.778638Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g
  2023-12-20T06:08:21.780378Z update 409 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4
  2023-12-20T06:08:21.790000Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g
  2023-12-20T06:08:21.802780Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g

Using a leader lock will avoid this contention, and the system should
be able to coast through brief moments after an outgoing leader leaves
until a replacement leader picks things back up.

I'm also setting a Recreate strategy [2], because:

1. Incoming pod surged by the default Deployment strategy.
2. Incoming pod attempts to acquire the Lease, but the outgoing pod is holding it.
3. Outgoing pod releases the lease and exits.
4. Incoming pod tries again, and this time acquires the lease.

can be slow in the 3-to-4 phase, while:

1. Outgoing pod releases the lease and exits.
2. Incoming pod created, scheduled, and acquires the lease.

tends to be faster.  And again, the component should be able to coast
through small durations without a functioning leader.

See openshift/machine-config-operator@7530ded86 (install: Recreate and
delayed default ServiceAccount deletion, 2023-08-29,
openshift/machine-config-operator#3895) for another example of how
Recreate can help that way.

To get the leader election working, I'm also setting LeaderElectionID
and LeaderElectionNamespace, pattern-matching from [3].  Using
cluster-baremetal-operator as the ID avoids collisions with
baremetal-operator [4].  COMPONENT_NAMESPACE is from:

  - name: COMPONENT_NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace

using the pattern documented in [5].  We've been asking for that
environment variable since f368a4f (Add sa, deployment and co
manifests, 2020-09-29, openshift#25), although grepping through history, I
don't see anything consuming it.  But with this commit, we now have a
consumer :).

[1]: https://issues.redhat.com/browse/OCPBUGS-25766
[2]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
[3]: https://github.com/openshift/baremetal-operator/blob/749609dd0fb858be72a783c5d87853cad2d303db/main.go#L207-L208
[4]: https://github.com/openshift/baremetal-operator/blob/749609dd0fb858be72a783c5d87853cad2d303db/main.go#L71
[5]: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
wking added a commit to wking/cluster-baremetal-operator that referenced this pull request Dec 21, 2023
…ader election

The option has been available for years:

  $ git blame main.go | grep enable-leader-election
  dcbe86f (Sandhya Dasu               2020-08-18 21:09:29 -0400  72)    flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,

and without it overlapping operator pods can crash-loop [1]:

  : [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers	0s
  {  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 26 times
  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 51 times}

while fighting each other over the same ClusterOperator status:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1737335551998038016/artifacts/e2e-aws-ovn-upgrade/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2
  $ zgrep -h '"resource":"clusteroperators","name":"baremetal"' kube-apiserver/*audit*log.gz | jq -r 'select(.verb == "create" or .verb == "update") | .stageTimestamp + " " + .verb + " " + (.responseStatus.code | tostring) + " " + (.objectRef.subresource) + " " + .user.username + " " + .user.extra["authentication.kubernetes.io/pod-name"][0]' | grep 'T06:08:.*cluster-baremetal-operator' | sort
  2023-12-20T06:08:21.757799Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4
  2023-12-20T06:08:21.778638Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g
  2023-12-20T06:08:21.780378Z update 409 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4
  2023-12-20T06:08:21.790000Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g
  2023-12-20T06:08:21.802780Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g

Using a leader lock will avoid this contention, and the system should
be able to coast through brief moments after an outgoing leader leaves
until a replacement leader picks things back up.

I'm also setting a Recreate strategy [2], because:

1. Incoming pod surged by the default Deployment strategy.
2. Incoming pod attempts to acquire the Lease, but the outgoing pod is holding it.
3. Outgoing pod releases the lease and exits.
4. Incoming pod tries again, and this time acquires the lease.

can be slow in the 3-to-4 phase, while:

1. Outgoing pod releases the lease and exits.
2. Incoming pod created, scheduled, and acquires the lease.

tends to be faster.  And again, the component should be able to coast
through small durations without a functioning leader.

See openshift/machine-config-operator@7530ded86 (install: Recreate and
delayed default ServiceAccount deletion, 2023-08-29,
openshift/machine-config-operator#3895) for another example of how
Recreate can help that way.

To get the leader election working, I'm also setting LeaderElectionID
and LeaderElectionNamespace, pattern-matching from [3].  Using
cluster-baremetal-operator as the ID avoids collisions with
baremetal-operator [4].  There is a COMPONENT_NAMESPACE is from:

  - name: COMPONENT_NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace

using the pattern documented in [5].  We've been asking for that
environment variable since f368a4f (Add sa, deployment and co
manifests, 2020-09-29, openshift#25), although grepping through history, I
don't see anything consuming it.  Perhaps it's somehow feeding
controllers.ComponentNamespace, which is what I'm using.

[1]: https://issues.redhat.com/browse/OCPBUGS-25766
[2]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
[3]: https://github.com/openshift/baremetal-operator/blob/749609dd0fb858be72a783c5d87853cad2d303db/main.go#L207-L208
[4]: https://github.com/openshift/baremetal-operator/blob/749609dd0fb858be72a783c5d87853cad2d303db/main.go#L71
[5]: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-baremetal-operator that referenced this pull request Feb 7, 2024
…ader election

The option has been available for years:

  $ git blame main.go | grep enable-leader-election
  dcbe86f (Sandhya Dasu               2020-08-18 21:09:29 -0400  72)    flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,

and without it overlapping operator pods can crash-loop [1]:

  : [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers	0s
  {  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 26 times
  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 51 times}

while fighting each other over the same ClusterOperator status:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1737335551998038016/artifacts/e2e-aws-ovn-upgrade/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2
  $ zgrep -h '"resource":"clusteroperators","name":"baremetal"' kube-apiserver/*audit*log.gz | jq -r 'select(.verb == "create" or .verb == "update") | .stageTimestamp + " " + .verb + " " + (.responseStatus.code | tostring) + " " + (.objectRef.subresource) + " " + .user.username + " " + .user.extra["authentication.kubernetes.io/pod-name"][0]' | grep 'T06:08:.*cluster-baremetal-operator' | sort
  2023-12-20T06:08:21.757799Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4
  2023-12-20T06:08:21.778638Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g
  2023-12-20T06:08:21.780378Z update 409 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-574577fbcb-z8nd4
  2023-12-20T06:08:21.790000Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g
  2023-12-20T06:08:21.802780Z update 200 status system:serviceaccount:openshift-machine-api:cluster-baremetal-operator cluster-baremetal-operator-7fbb57959b-s9v9g

Using a leader lock will avoid this contention, and the system should
be able to coast through brief moments after an outgoing leader leaves
until a replacement leader picks things back up.

I'm also setting a Recreate strategy [2], because:

1. Incoming pod surged by the default Deployment strategy.
2. Incoming pod attempts to acquire the Lease, but the outgoing pod is holding it.
3. Outgoing pod releases the lease and exits.
4. Incoming pod tries again, and this time acquires the lease.

can be slow in the 3-to-4 phase, while:

1. Outgoing pod releases the lease and exits.
2. Incoming pod created, scheduled, and acquires the lease.

tends to be faster.  And again, the component should be able to coast
through small durations without a functioning leader.

See openshift/machine-config-operator@7530ded86 (install: Recreate and
delayed default ServiceAccount deletion, 2023-08-29,
openshift/machine-config-operator#3895) for another example of how
Recreate can help that way.

To get the leader election working, I'm also setting LeaderElectionID
and LeaderElectionNamespace, pattern-matching from [3].  Using
cluster-baremetal-operator as the ID avoids collisions with
baremetal-operator [4].  There is a COMPONENT_NAMESPACE is from:

  - name: COMPONENT_NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace

using the pattern documented in [5].  We've been asking for that
environment variable since f368a4f (Add sa, deployment and co
manifests, 2020-09-29, openshift#25), although grepping through history, I
don't see anything consuming it.  Perhaps it's somehow feeding
controllers.ComponentNamespace, which is what I'm using.

[1]: https://issues.redhat.com/browse/OCPBUGS-25766
[2]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
[3]: https://github.com/openshift/baremetal-operator/blob/749609dd0fb858be72a783c5d87853cad2d303db/main.go#L207-L208
[4]: https://github.com/openshift/baremetal-operator/blob/749609dd0fb858be72a783c5d87853cad2d303db/main.go#L71
[5]: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants