Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Sep 13, 2022

I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (openshift/release#32153). But that lead to many failures for:

Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This pull request extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Sep 13, 2022
@openshift-ci-robot
Copy link

@wking: This pull request references Jira Issue OCPBUGS-1265, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (openshift/release#32153). But that lead to many failures for:

Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This pull request extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 13, 2022
@wking wking force-pushed the check-telemeter-config-for-enablement branch 2 times, most recently from 411bc2b to fa8c329 Compare September 13, 2022 23:19
Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise.

@wking wking force-pushed the check-telemeter-config-for-enablement branch 4 times, most recently from a78a075 to bcf7811 Compare September 14, 2022 23:43
@wking
Copy link
Member Author

wking commented Sep 14, 2022

Trying to pick up the job from openshift/release#32252; unclear why I'm having trouble finding it without the explicit launch...

/test e2e-aws

@wking wking force-pushed the check-telemeter-config-for-enablement branch 2 times, most recently from 2d3088d to 517da21 Compare September 15, 2022 05:53
@wking
Copy link
Member Author

wking commented Sep 15, 2022

/test e2e-aws

I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
@wking wking force-pushed the check-telemeter-config-for-enablement branch from 517da21 to 76652fa Compare September 16, 2022 06:23
@wking
Copy link
Member Author

wking commented Sep 16, 2022

/test e2e-aws

@wking
Copy link
Member Author

wking commented Sep 16, 2022

Success:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/27422/pull-ci-openshift-origin-master-e2e-aws/1570659685793533952/artifacts/e2e-aws/openshift-e2e-test/build-log.txt | grep -B3 'skipped.*Prometheus when installed on the cluster should report telemetry'
skip [github.com/openshift/origin/test/extended/prometheus/prometheus.go:461]: Telemetry is disabled: openshift-monitoring/cluster-monitoring-config telemeterClient enabled is: false
Ginkgo exit error 3: exit with code 3

skipped: (7.3s) 2022-09-16T07:57:12 "[sig-instrumentation] Prometheus when installed on the cluster should report telemetry [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"

and the run as a whole passed.

@stbenjam
Copy link
Member

stbenjam commented Sep 18, 2022

This is renaming a test but the annotations have been correctly updated, and I don't see any references to it in the release repo. And the rest of the logic looks correct to me.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2022
@stbenjam
Copy link
Member

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stbenjam, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2022
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 9509a2b and 2 for PR HEAD 76652fa in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2022

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-csi 76652fa link false /test e2e-gcp-csi
ci/prow/e2e-metal-ipi-ovn-ipv6 76652fa link false /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws-ovn-single-node-serial 76652fa link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-gcp-ovn-rt-upgrade 76652fa link false /test e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-aws-ovn-single-node-upgrade 76652fa link false /test e2e-aws-ovn-single-node-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@stbenjam
Copy link
Member

/override ci/prow/e2e-aws-ovn-serial

@stbenjam
Copy link
Member

/skip

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2022

@stbenjam: Overrode contexts on behalf of stbenjam: ci/prow/e2e-aws-ovn-serial

Details

In response to this:

/override ci/prow/e2e-aws-ovn-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit b356ea7 into openshift:master Sep 19, 2022
@openshift-ci-robot
Copy link

@wking: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-1265 has been moved to the MODIFIED state.

Details

In response to this:

I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (openshift/release#32153). But that lead to many failures for:

Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This pull request extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the check-telemeter-config-for-enablement branch September 19, 2022 15:59
wking added a commit to wking/openshift-release that referenced this pull request Oct 19, 2023
We'd tried this previously in 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift#32153), 2022-09-13), but had to roll it back with d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(openshift#32249), 2022-09-13), to avoid failing:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus:
Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422)
taught that test-case about the config knob this step uses to disable
Telemetry.  Those test-case changes are present in origin test suites
starting in 4.12:

  $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done
  origin/release-4.11:test/extended/prometheus/prometheus.go:             g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() {
  origin/release-4.12:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.13:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.14:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {
  origin/release-4.15:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {

and 4.10 is end-of-life since 2023-09-10 [1].  That leaves tests using
4.11 versions of the origin suite, and I'm addressing those via the
JOB_NAME checks [2,3].  The checks are brittle, leaving out 4.9 and
earlier, and possibly not matching some 4.11 jobs, but they will
hopefully be sufficient to get us through until 4.11 goes end-of-life
on 2024-02-10 [3].  And when the defaulting logic breaks down, jobs
that have an opinion can set TELEMETRY_ENABLED explicitly to match
their needs.

[1]: https://access.redhat.com/support/policy/updates/openshift/#dates
[2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables
[3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
wking added a commit to wking/openshift-release that referenced this pull request Oct 19, 2023
We'd tried this previously in 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift#32153), 2022-09-13), but had to roll it back with d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(openshift#32249), 2022-09-13), to avoid failing:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus:
Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422)
taught that test-case about the config knob this step uses to disable
Telemetry.  Those test-case changes are present in origin test suites
starting in 4.12:

  $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done
  origin/release-4.11:test/extended/prometheus/prometheus.go:             g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() {
  origin/release-4.12:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.13:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.14:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {
  origin/release-4.15:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {

and 4.10 is end-of-life since 2023-09-10 [1].  That leaves tests using
4.11 versions of the origin suite, and I'm addressing those via the
JOB_NAME checks [2,3].  The checks are brittle, leaving out 4.9 and
earlier, and possibly not matching some 4.11 jobs, but they will
hopefully be sufficient to get us through until 4.11 goes end-of-life
on 2024-02-10 [3].  And when the defaulting logic breaks down, jobs
that have an opinion can set TELEMETRY_ENABLED explicitly to match
their needs.

[1]: https://access.redhat.com/support/policy/updates/openshift/#dates
[2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables
[3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
openshift-ci bot pushed a commit to openshift/release that referenced this pull request Oct 19, 2023
* ci-operator/step-registry/ipi/conf/telemetry: Disable by default (again)

We'd tried this previously in 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(#32153), 2022-09-13), but had to roll it back with d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(#32249), 2022-09-13), to avoid failing:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus:
Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422)
taught that test-case about the config knob this step uses to disable
Telemetry.  Those test-case changes are present in origin test suites
starting in 4.12:

  $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done
  origin/release-4.11:test/extended/prometheus/prometheus.go:             g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() {
  origin/release-4.12:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.13:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.14:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {
  origin/release-4.15:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {

and 4.10 is end-of-life since 2023-09-10 [1].  That leaves tests using
4.11 versions of the origin suite, and I'm addressing those via the
JOB_NAME checks [2,3].  The checks are brittle, leaving out 4.9 and
earlier, and possibly not matching some 4.11 jobs, but they will
hopefully be sufficient to get us through until 4.11 goes end-of-life
on 2024-02-10 [3].  And when the defaulting logic breaks down, jobs
that have an opinion can set TELEMETRY_ENABLED explicitly to match
their needs.

[1]: https://access.redhat.com/support/policy/updates/openshift/#dates
[2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables
[3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables

* Make 4.11 regex pass shellcheck

---------

Co-authored-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants