-
Notifications
You must be signed in to change notification settings - Fork 4.8k
OCPBUGS-1265: test/extended/prometheus: Consider telemeterClient.enabled #27422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-1265: test/extended/prometheus: Consider telemeterClient.enabled #27422
Conversation
|
@wking: This pull request references Jira Issue OCPBUGS-1265, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
411bc2b to
fa8c329
Compare
dgoodwin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm otherwise.
a78a075 to
bcf7811
Compare
|
Trying to pick up the job from openshift/release#32252; unclear why I'm having trouble finding it without the explicit launch... /test e2e-aws |
2d3088d to
517da21
Compare
|
/test e2e-aws |
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
517da21 to
76652fa
Compare
|
/test e2e-aws |
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/27422/pull-ci-openshift-origin-master-e2e-aws/1570659685793533952/artifacts/e2e-aws/openshift-e2e-test/build-log.txt | grep -B3 'skipped.*Prometheus when installed on the cluster should report telemetry'
skip [github.com/openshift/origin/test/extended/prometheus/prometheus.go:461]: Telemetry is disabled: openshift-monitoring/cluster-monitoring-config telemeterClient enabled is: false
Ginkgo exit error 3: exit with code 3
skipped: (7.3s) 2022-09-16T07:57:12 "[sig-instrumentation] Prometheus when installed on the cluster should report telemetry [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"and the run as a whole passed. |
|
This is renaming a test but the annotations have been correctly updated, and I don't see any references to it in the release repo. And the rest of the logic looks correct to me. /lgtm |
|
/retest-required |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stbenjam, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/override ci/prow/e2e-aws-ovn-serial |
|
/skip |
|
@stbenjam: Overrode contexts on behalf of stbenjam: ci/prow/e2e-aws-ovn-serial DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-1265 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We'd tried this previously in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift#32153), 2022-09-13), but had to roll it back with d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (openshift#32249), 2022-09-13), to avoid failing: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus: Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422) taught that test-case about the config knob this step uses to disable Telemetry. Those test-case changes are present in origin test suites starting in 4.12: $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done origin/release-4.11:test/extended/prometheus/prometheus.go: g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() { origin/release-4.12:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.13:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.14:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { origin/release-4.15:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { and 4.10 is end-of-life since 2023-09-10 [1]. That leaves tests using 4.11 versions of the origin suite, and I'm addressing those via the JOB_NAME checks [2,3]. The checks are brittle, leaving out 4.9 and earlier, and possibly not matching some 4.11 jobs, but they will hopefully be sufficient to get us through until 4.11 goes end-of-life on 2024-02-10 [3]. And when the defaulting logic breaks down, jobs that have an opinion can set TELEMETRY_ENABLED explicitly to match their needs. [1]: https://access.redhat.com/support/policy/updates/openshift/#dates [2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables [3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
We'd tried this previously in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift#32153), 2022-09-13), but had to roll it back with d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (openshift#32249), 2022-09-13), to avoid failing: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus: Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422) taught that test-case about the config knob this step uses to disable Telemetry. Those test-case changes are present in origin test suites starting in 4.12: $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done origin/release-4.11:test/extended/prometheus/prometheus.go: g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() { origin/release-4.12:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.13:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.14:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { origin/release-4.15:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { and 4.10 is end-of-life since 2023-09-10 [1]. That leaves tests using 4.11 versions of the origin suite, and I'm addressing those via the JOB_NAME checks [2,3]. The checks are brittle, leaving out 4.9 and earlier, and possibly not matching some 4.11 jobs, but they will hopefully be sufficient to get us through until 4.11 goes end-of-life on 2024-02-10 [3]. And when the defaulting logic breaks down, jobs that have an opinion can set TELEMETRY_ENABLED explicitly to match their needs. [1]: https://access.redhat.com/support/policy/updates/openshift/#dates [2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables [3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
* ci-operator/step-registry/ipi/conf/telemetry: Disable by default (again) We'd tried this previously in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (#32153), 2022-09-13), but had to roll it back with d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (#32249), 2022-09-13), to avoid failing: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus: Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422) taught that test-case about the config knob this step uses to disable Telemetry. Those test-case changes are present in origin test suites starting in 4.12: $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done origin/release-4.11:test/extended/prometheus/prometheus.go: g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() { origin/release-4.12:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.13:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.14:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { origin/release-4.15:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { and 4.10 is end-of-life since 2023-09-10 [1]. That leaves tests using 4.11 versions of the origin suite, and I'm addressing those via the JOB_NAME checks [2,3]. The checks are brittle, leaving out 4.9 and earlier, and possibly not matching some 4.11 jobs, but they will hopefully be sufficient to get us through until 4.11 goes end-of-life on 2024-02-10 [3]. And when the defaulting logic breaks down, jobs that have an opinion can set TELEMETRY_ENABLED explicitly to match their needs. [1]: https://access.redhat.com/support/policy/updates/openshift/#dates [2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables [3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables * Make 4.11 regex pass shellcheck --------- Co-authored-by: W. Trevor King <wking@tremily.us>
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (openshift/release#32153). But that lead to many failures for:
This pull request extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token.
I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.