-
Notifications
You must be signed in to change notification settings - Fork 2.1k
OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry #32153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry #32153
Conversation
08506e5 to
4ce3005
Compare
The OCM team is concerned about the volume of data submitted by ephemeral CI clusters. This commit disables Telemetry by default (for the bulk of CI, which flows through this step), while still allowing Telemetry to be enabled for jobs that have important data to report, or jobs that want to excersise the Telemetry or Insights reporting logic. From [1]: > You can modify your existing global cluster pull secret to disable > remote health reporting. This disables both Telemetry and the > Insights Operator. In 4.12, the Insights operator is growing a new configuration structure that allows disabling Insights [1,2] without affecting Telemetry. And it turns out that the monitoring operator has a similar option to disable Telemetry [3], although the only documented use for telemeterClient is for setting nodeSelector [4]. There are a number of existing steps poking at the cluster-monitoring-config ConfigMap, including ipi-conf-inframachineset setting telemeterClient's nodeSelector, so I am following ipi-conf-user-workload-monitoring's use of yq patching and manifest naming to fit in with the other steps. I'm following 8a4696c (using yq installed from upi-installer image, 2022-08-25, openshift#31256) to get yq from the upi-installer image instead of curling it down dynamically. With the 4.12 pin to get `yq`, we're safe using `upi-installer`, even for CI jobs that are otherwise looking at different 4.y. [1]: https://github.com/openshift/enhancements/blob/ef85659d01738b9f89958d5f0da31cff05bb1182/enhancements/insights/insights-config-api.md [2]: https://docs.openshift.com/container-platform/4.11/support/remote_health_monitoring/opting-out-of-remote-health-reporting.html [3]: https://github.com/openshift/cluster-monitoring-operator/blob/8d331d78b22948d36c20da0552763ddd8a4e2093/pkg/manifests/config.go#L337 [4]: https://docs.openshift.com/container-platform/4.11/monitoring/configuring-the-monitoring-stack.html#moving-monitoring-components-to-different-nodes_configuring-the-monitoring-stack
Generated with: $ sed -i 's/^\( *- ref: ipi-conf\)$/\1\n\1-telemetry/' $(git grep -l '^ *- ref: ipi-conf$' ci-operator/step-registry) to slot the new step in after the generic ipi-conf, so we can configure Telemetry in all of the existing chains and workflows that were flowing through ipi-conf.
When making changes to the Telemeter client, we want to ensure that we're still excercising uploads.
4ce3005 to
00b1310
Compare
|
/lgtm |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Auditing some of the rehearsals, openshift-telemeter-master-e2e-aws-upgrade: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-telemeter-master-e2e-aws-upgrade/1568161661800419328/artifacts/e2e-aws-upgrade/ipi-conf-telemetry/build-log.txt
Nothing to do with TELEMETRY_ENABLED='true'
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-telemeter-master-e2e-aws-upgrade/1568161661800419328/artifacts/e2e-aws-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].spec.clusterID'
6f985223-6552-4027-aa0b-ac2b84dd5495And we see Telemetry from that cluster: Moving to openshift-cloud-provider-azure-master-openshift-e2e-azure-ccm-install: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-cloud-provider-azure-master-openshift-e2e-azure-ccm-install/1568161660386938880/artifacts/openshift-e2e-azure-ccm-install/ipi-conf-telemetry/build-log.txtCreating /tmp/secret/manifest_cluster-monitoring-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: >-
telemeterClient:
enabled: false
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-cloud-provider-azure-master-openshift-e2e-azure-ccm-install/1568161660386938880/artifacts/openshift-e2e-azure-ccm-install/gather-extra/artifacts/clusterversion.json | jq -r '.items[].spec.clusterID'
01a01f0d-73d2-4bf0-a465-73f823feaa56And there's nothing from that cluster: |
jan--f
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@wking: Updated the
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Effectively neutralizing 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift#32153), 2022-09-13), until we teach origin's test case to understand this disabling: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=6h&type=junit&name=^periodic-&search=Prometheus+when+installed+on+the+cluster+should+report+telemetry+if+a+cloud.openshift.com+token+is+present' | grep 'failures match' | sort periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-ovn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ovn-serial-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ppc64le-powervs (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-serial-aws-arm64 (all) - 4 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-image-ecosystem-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-stable-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.11-e2e-aws-cgroupsv2 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-gcp-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade (all) - 10 runs, 100% failed, 20% of failures match = 20% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.7-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.7-e2e-azure-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.7-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-alibaba (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws (all) - 3 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-single-node-workers (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-azure-sdn-fips-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-ovn-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-okd-4.12-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact With failures like [1]: : [sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel] Run #0: 1m1s { fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:465]: Unexpected error: <errors.aggregate | len:2, cap:2>: [ { s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]", }, { s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]", }, ] [promQL query returned unexpected results: metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1 [], promQL query returned unexpected results: federate_samples{job="telemeter-client"} >= 10 []] occurred [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun/1569617920785387520
Effectively neutralizing 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (#32153), 2022-09-13), until we teach origin's test case to understand this disabling: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=6h&type=junit&name=^periodic-&search=Prometheus+when+installed+on+the+cluster+should+report+telemetry+if+a+cloud.openshift.com+token+is+present' | grep 'failures match' | sort periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-ovn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ovn-serial-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ppc64le-powervs (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-serial-aws-arm64 (all) - 4 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-image-ecosystem-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-stable-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.11-e2e-aws-cgroupsv2 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-e2e-gcp-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade (all) - 10 runs, 100% failed, 20% of failures match = 20% impact periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.7-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.7-e2e-azure-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.7-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-alibaba (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws (all) - 3 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-single-node-workers (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-azure-sdn-fips-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-ovn-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.9-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-okd-4.12-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact With failures like [1]: : [sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel] Run #0: 1m1s { fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:465]: Unexpected error: <errors.aggregate | len:2, cap:2>: [ { s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]", }, { s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]", }, ] [promQL query returned unexpected results: metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1 [], promQL query returned unexpected results: federate_samples{job="telemeter-client"} >= 10 []] occurred [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun/1569617920785387520
…Telemetry for e2e-aws I'd disabled Telemetry for the bulk of the CI fleet in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present so I'd flipped the default to keeping Telemetry enabled in d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (openshift#32249), 2022-09-13). Now I'm looking to teach the origin test-case skip about the mechanism I used to disable Telemetry, and I want an origin master presubmit with Telemetry disabled. The only run_if_changed origin master presubmits are e2e-gcp-builds, e2e-aws-jenkins, e2e-gcp-image-ecosystem, and e2e-aws-image-registry, and none of those sound like job that will run the test-case I'm interested in (although maybe they do; I haven't dug in to confirm). But e2e-aws is optional, so having the presubmit temporarily failing for other origin master pull requests won't block changes from landing. We'll revert this change and return the job to the CI-wide Telemetry default once we've confirmed that the test-case skips are smart enough.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
…Telemetry for e2e-aws (#32252) I'd disabled Telemetry for the bulk of the CI fleet in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present so I'd flipped the default to keeping Telemetry enabled in d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (#32249), 2022-09-13). Now I'm looking to teach the origin test-case skip about the mechanism I used to disable Telemetry, and I want an origin master presubmit with Telemetry disabled. The only run_if_changed origin master presubmits are e2e-gcp-builds, e2e-aws-jenkins, e2e-gcp-image-ecosystem, and e2e-aws-image-registry, and none of those sound like job that will run the test-case I'm interested in (although maybe they do; I haven't dug in to confirm). But e2e-aws is optional, so having the presubmit temporarily failing for other origin master pull requests won't block changes from landing. We'll revert this change and return the job to the CI-wide Telemetry default once we've confirmed that the test-case skips are smart enough.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
I'd disabled Telemetry for the bulk of the CI fleet in openshift/release@3c1da8eb20 (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift/release#32153), 2022-09-13). But that lead to many failures for: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present This commit extends the checks for Telemetry enablement to include the monitoring-specific ConfigMap, as well as the previously-checked pull-secret token. I'm copy/pasting a subset of the monitoring configuration structure instead of vendoring the config, because we aren't vendoring the cluster-monitoring-operator in origin today, bumping to keep such vendoring up to date would be tedious, and the monitoring config API is unlikely to shift this knob around within the structure. It would be nice if the monitoring config schema moved into opensihft/api or somewhere else where it would be more clear that it was a cluster-admin-facing API, with the usual stability commitments, but it's not there today.
We'd tried this previously in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift#32153), 2022-09-13), but had to roll it back with d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (openshift#32249), 2022-09-13), to avoid failing: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus: Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422) taught that test-case about the config knob this step uses to disable Telemetry. Those test-case changes are present in origin test suites starting in 4.12: $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done origin/release-4.11:test/extended/prometheus/prometheus.go: g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() { origin/release-4.12:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.13:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.14:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { origin/release-4.15:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { and 4.10 is end-of-life since 2023-09-10 [1]. That leaves tests using 4.11 versions of the origin suite, and I'm addressing those via the JOB_NAME checks [2,3]. The checks are brittle, leaving out 4.9 and earlier, and possibly not matching some 4.11 jobs, but they will hopefully be sufficient to get us through until 4.11 goes end-of-life on 2024-02-10 [3]. And when the defaulting logic breaks down, jobs that have an opinion can set TELEMETRY_ENABLED explicitly to match their needs. [1]: https://access.redhat.com/support/policy/updates/openshift/#dates [2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables [3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
We'd tried this previously in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (openshift#32153), 2022-09-13), but had to roll it back with d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (openshift#32249), 2022-09-13), to avoid failing: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus: Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422) taught that test-case about the config knob this step uses to disable Telemetry. Those test-case changes are present in origin test suites starting in 4.12: $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done origin/release-4.11:test/extended/prometheus/prometheus.go: g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() { origin/release-4.12:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.13:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.14:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { origin/release-4.15:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { and 4.10 is end-of-life since 2023-09-10 [1]. That leaves tests using 4.11 versions of the origin suite, and I'm addressing those via the JOB_NAME checks [2,3]. The checks are brittle, leaving out 4.9 and earlier, and possibly not matching some 4.11 jobs, but they will hopefully be sufficient to get us through until 4.11 goes end-of-life on 2024-02-10 [3]. And when the defaulting logic breaks down, jobs that have an opinion can set TELEMETRY_ENABLED explicitly to match their needs. [1]: https://access.redhat.com/support/policy/updates/openshift/#dates [2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables [3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
* ci-operator/step-registry/ipi/conf/telemetry: Disable by default (again) We'd tried this previously in 3c1da8e (OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry (#32153), 2022-09-13), but had to roll it back with d61129c (ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry (#32249), 2022-09-13), to avoid failing: Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus: Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422) taught that test-case about the config knob this step uses to disable Telemetry. Those test-case changes are present in origin test suites starting in 4.12: $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done origin/release-4.11:test/extended/prometheus/prometheus.go: g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() { origin/release-4.12:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.13:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Late]", func() { origin/release-4.14:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { origin/release-4.15:test/extended/prometheus/prometheus.go: g.It("should report telemetry [Serial] [Late]", func() { and 4.10 is end-of-life since 2023-09-10 [1]. That leaves tests using 4.11 versions of the origin suite, and I'm addressing those via the JOB_NAME checks [2,3]. The checks are brittle, leaving out 4.9 and earlier, and possibly not matching some 4.11 jobs, but they will hopefully be sufficient to get us through until 4.11 goes end-of-life on 2024-02-10 [3]. And when the defaulting logic breaks down, jobs that have an opinion can set TELEMETRY_ENABLED explicitly to match their needs. [1]: https://access.redhat.com/support/policy/updates/openshift/#dates [2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables [3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables * Make 4.11 regex pass shellcheck --------- Co-authored-by: W. Trevor King <wking@tremily.us>


The OCM team is concerned about the volume of data submitted by ephemeral CI clusters. This commit disables Telemetry by default (for the bulk of CI, which flows through the new step), while still allowing Telemetry to be enabled for jobs that have important data to report, or jobs that want to excersise the Telemetry or Insights reporting logic.
From these docs:
In 4.12, the Insights operator is growing a new configuration structure that allows disabling Insights (enhancement, API) without affecting Telemetry. And it turns out that the monitoring operator has a similar option to disable Telemetry, although the only documented use for 1telemeterClient1 is for setting nodeSelector.
There are a number of existing steps poking at the
cluster-monitoring-configConfigMap, includingipi-conf-inframachinesetsettingtelemeterClient'snodeSelector, so I am followingipi-conf-user-workload-monitoring's use ofyqpatching and manifest naming to fit in with the other steps.I'm following 8a4696c (#31256) to get
yqfrom theupi-installerimage instead ofcurling it down dynamically. With the 4.12 pin to getyq, we're safe usingupi-installer, even for CI jobs that are otherwise looking at different 4.y.The logic is coming in as a new step, because it feels like touching
cluster-monitoring-configshould be owned by some monitoring folks. I've added @jan--f as a co-maintainer to that end, but I'm happy to hand this off to whoever to maintain however they like.To attach the new step to existing workflows, I used:
$ sed -i 's/^\( *- ref: ipi-conf\)$/\1\n\1-telemetry/' $(git grep -l '^ *- ref: ipi-conf$' ci-operator/step-registry)to inject the step after
ipi-confin any consumers that were already consumingipi-conf.