Skip to content

Conversation

@jottofar
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 14, 2022
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2022
@jottofar jottofar force-pushed the get-cv-sooner branch 3 times, most recently from fb9e93a to 4d421e9 Compare February 15, 2022 13:49
@jottofar
Copy link
Contributor Author

/test e2e-agnostic

@jottofar jottofar force-pushed the get-cv-sooner branch 7 times, most recently from e03ec75 to 4611337 Compare February 16, 2022 19:21
@wking
Copy link
Member

wking commented Feb 21, 2022

Looking back at 4d421e9, where you were adjusting the existing InitializeFromPayload in start.go, that was happening in Options.Run. But Options.Run calls Options.run as a final step, and the informers get started in Options.run, so at that point you'll be waiting on ClusterVersion forever watching an unstarted informer.

In 4611337, you've moved the load into OnStartedLeading, and that works because it's after we've started the informers. But we shouldn't need an actual lease to load the data, and there's some latency benefits by working on those in parallel, or at least performing the more fixed-time payload load first, before blocking on a possibly contended lease acquisition. Can we move the InitializeFromPayload call to sit right after the informer starts, and before the lease acquisition? Possibly with a leading wait for the controllerCtx.CVInformerFactory cache to sync? Or are we ok with the low latency cost of keeping the InitializeFromPayload behind OnStartedLeading?

@jottofar
Copy link
Contributor Author

Looking back at 4d421e9, where you were adjusting the existing InitializeFromPayload in start.go, that was happening in Options.Run. But Options.Run calls Options.run as a final step, and the informers get started in Options.run, so at that point you'll be waiting on ClusterVersion forever watching an unstarted informer.

In 4611337, you've moved the load into OnStartedLeading, and that works because it's after we've started the informers. But we shouldn't need an actual lease to load the data, and there's some latency benefits by working on those in parallel, or at least performing the more fixed-time payload load first, before blocking on a possibly contended lease acquisition. Can we move the InitializeFromPayload call to sit right after the informer starts, and before the lease acquisition? Possibly with a leading wait for the controllerCtx.CVInformerFactory cache to sync? Or are we ok with the low latency cost of keeping the InitializeFromPayload behind OnStartedLeading?

No, I see no reason to wait on the lease acquisition so I'll move things around. I wasn't convinced yet that it was the actual lease acquisition but had not circled back to look further.

@jottofar jottofar force-pushed the get-cv-sooner branch 4 times, most recently from cbe8320 to 2419571 Compare February 22, 2022 15:31
@jottofar
Copy link
Contributor Author

/retitle Get cluster version object earlier in startup

@jottofar jottofar force-pushed the get-cv-sooner branch 2 times, most recently from 72f2a04 to c86743c Compare March 9, 2022 14:37
@jottofar
Copy link
Contributor Author

jottofar commented Mar 9, 2022

/retest

@jottofar
Copy link
Contributor Author

jottofar commented Mar 9, 2022

/test e2e-agnostic-operator

@jottofar
Copy link
Contributor Author

jottofar commented Mar 9, 2022

/retest

Since at least 90e9881 (cvo: Change the core CVO loops to
report status to ClusterVersion, 2018-11-02, openshift#45), the CVO
created a default ClusterVersion when there was none in the
cluster. In d7760ce (pkg/cvo: Drop ClusterVersion
defaulting during bootstrap, 2019-08-16, openshift#238), we removed
that defaulting during cluster-bootstrap, to avoid racing
with the installer-supplied ClusterVersion and its
user-specified configuration. In this commit, we're removing
ClusterVersion defaulting entirely, and the CVO will just
patiently wait until it gets a ClusterVersion before
continuing. Admins rarely delete ClusterVersion in practice,
creating a sane default is becoming more difficult as the
spec configuration becomes richer, and waiting for the admin
to come back and ask the CVO to get back to work allows us
to simplify the code without leaving customers at risk.
@jottofar
Copy link
Contributor Author

/test e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2022

@LalatenduMohanty: Overrode contexts on behalf of LalatenduMohanty: ci/prow/e2e-agnostic

Details

In response to this:

/override ci/prow/e2e-agnostic

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LalatenduMohanty
Copy link
Member

/override cancel

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2022

@LalatenduMohanty: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

  • cancel

Only the following contexts were expected:

  • ci/prow/e2e-agnostic
  • ci/prow/e2e-agnostic-operator
  • ci/prow/e2e-agnostic-upgrade
  • ci/prow/gofmt
  • ci/prow/golangci-lint
  • ci/prow/images
  • ci/prow/unit
  • pull-ci-openshift-cluster-version-operator-master-e2e-agnostic
  • pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-operator
  • pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade
  • pull-ci-openshift-cluster-version-operator-master-gofmt
  • pull-ci-openshift-cluster-version-operator-master-golangci-lint
  • pull-ci-openshift-cluster-version-operator-master-images
  • pull-ci-openshift-cluster-version-operator-master-unit
  • tide
Details

In response to this:

/override cancel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LalatenduMohanty
Copy link
Member

/override cancel ci/prow/e2e-agnostic

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2022

@LalatenduMohanty: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

  • cancel ci/prow/e2e-agnostic

Only the following contexts were expected:

  • ci/prow/e2e-agnostic
  • ci/prow/e2e-agnostic-operator
  • ci/prow/e2e-agnostic-upgrade
  • ci/prow/gofmt
  • ci/prow/golangci-lint
  • ci/prow/images
  • ci/prow/unit
  • pull-ci-openshift-cluster-version-operator-master-e2e-agnostic
  • pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-operator
  • pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade
  • pull-ci-openshift-cluster-version-operator-master-gofmt
  • pull-ci-openshift-cluster-version-operator-master-golangci-lint
  • pull-ci-openshift-cluster-version-operator-master-images
  • pull-ci-openshift-cluster-version-operator-master-unit
  • tide
Details

In response to this:

/override cancel ci/prow/e2e-agnostic

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member

wking commented Mar 11, 2022

Nothing in the update run sounds like it's this PR, and it's Friday, and build02 and Azure are both struggling, so:

/override ci/prow/e2e-agnostic-upgrade

and we'll have lots of cook time in 4.11 release informers by the time we're back next week ;)

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

Nothing in the update run sounds like it's this PR, and it's Friday, and build02 and Azure are both struggling, so:

/override ci/prow/e2e-agnostic-upgrade

and we'll have lots of cook time in 4.11 release informers by the time we're back next week ;)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2022

@jottofar: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit c85aa55 into openshift:master Mar 11, 2022
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 18, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

[1]: openshift#741
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 18, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

[1]: openshift#741
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 19, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

[1]: openshift#741
wking added a commit to wking/cluster-version-operator that referenced this pull request Mar 27, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

[1]: openshift#741
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 8, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

[1]: openshift#741
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 9, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

The awkward:

  {{ "{{ ... \"version\" }} ... {{ end }}" }}

business is because this content is unpacked in two rounds of
templating:

1. The cluster-version operator's getPayloadTasks' renderManifest
   preprocessing for the CVO directory, which is based on Go
   templates.
2. Prometheus alerting-rule templates, which use console templates
   [2], which are also based on Go templates [3].

The '{{ "..." }}' wrapping is consumed by the CVO's templating, and
the remaining:

  {{ ... "version" }} ... {{ end }}

is left for Promtheus' templating.

[1]: openshift#741
[2]: https://prometheus.io/docs/prometheus/2.51/configuration/alerting_rules/#templating
[3]: https://prometheus.io/docs/visualization/consoles/
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 10, 2024
…usterOperatorDegraded

By adding cluster_operator_up handling for ClusterVersion, with
'version' as the component name, the same way we handle
cluster_operator_conditions.  This plugs us into ClusterOperatorDown
(based on cluster_operator_up) and ClusterOperatorDegraded (based on
both cluster_operator_conditions and cluster_operator_up).

I've adjusted the ClusterOperatorDegraded rule so that it fires on
ClusterVersion Failing=True and does not fire on Failing=False.
Thinking through an update from before:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with this change.
3. New CVO comes in, starts serving
   cluster_operator_up{name="version"}.
4. Old ClusterOperatorDegraded no matching
   cluster_operator_conditions{name="version",condition="Degraded"},
   falls through to cluster_operator_up{name="version"}, and starts
   cooking the 'for: 30m'.
5. If we go more than 30m before updating the ClusterOperatorDegraded
   rule to understand Failing, ClusterOperatorDegraded would fire.

We'll need to backport the ClusterOperatorDegraded expr change to one
4.y release before the CVO-metrics change lands to get:

1. Outgoing CVO does not serve cluster_operator_up{name="version"}.
2. User requests an update to a release with the expr change.
3. Incoming ClusterOperatorDegraded sees no
   cluster_operator_conditions{name="version",condition="Degraded"},
   cluster_operator_conditions{name="version",condition="Failing"} (we
   hope), or cluster_operator_up{name="version"}, so it doesn't fire.
   Unless we are Failing=True, in which case, hooray, we'll start
   alerting about it.
4. User requests an update to a release with the CVO-metrics change.
5. New CVO starts serving cluster_operator_up, just like the
   fresh-modern-install situation, and everything is great.

The missing-ClusterVersion metrics don't matter all that much today,
because the CVO has been creating replacement ClusterVersion since at
least 90e9881 (cvo: Change the core CVO loops to report status to
ClusterVersion, 2018-11-02, openshift#45).  But it will become more important
with [1], which is planning on removing that default creation.  When
there is no ClusterVersion, we expect ClusterOperatorDown to fire.

The awkward:

  {{ "{{ ... \"version\" }} ... {{ end }}" }}

business is because this content is unpacked in two rounds of
templating:

1. The cluster-version operator's getPayloadTasks' renderManifest
   preprocessing for the CVO directory, which is based on Go
   templates.
2. Prometheus alerting-rule templates, which use console templates
   [2], which are also based on Go templates [3].

The '{{ "..." }}' wrapping is consumed by the CVO's templating, and
the remaining:

  {{ ... "version" }} ... {{ end }}

is left for Promtheus' templating.

[1]: openshift#741
[2]: https://prometheus.io/docs/prometheus/2.51/configuration/alerting_rules/#templating
[3]: https://prometheus.io/docs/visualization/consoles/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants