Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Sep 9, 2022

The OCM team is concerned about the volume of data submitted by ephemeral CI clusters. This commit disables Telemetry by default (for the bulk of CI, which flows through the new step), while still allowing Telemetry to be enabled for jobs that have important data to report, or jobs that want to excersise the Telemetry or Insights reporting logic.

From these docs:

You can modify your existing global cluster pull secret to disable remote health reporting. This disables both Telemetry and the Insights Operator.

In 4.12, the Insights operator is growing a new configuration structure that allows disabling Insights (enhancement, API) without affecting Telemetry. And it turns out that the monitoring operator has a similar option to disable Telemetry, although the only documented use for 1telemeterClient1 is for setting nodeSelector.

There are a number of existing steps poking at the cluster-monitoring-config ConfigMap, including ipi-conf-inframachineset setting telemeterClient's nodeSelector, so I am following ipi-conf-user-workload-monitoring's use of yq patching and manifest naming to fit in with the other steps.

I'm following 8a4696c (#31256) to get yq from the upi-installer image instead of curling it down dynamically. With the 4.12 pin to get yq, we're safe using upi-installer, even for CI jobs that are otherwise looking at different 4.y.

The logic is coming in as a new step, because it feels like touching cluster-monitoring-config should be owned by some monitoring folks. I've added @jan--f as a co-maintainer to that end, but I'm happy to hand this off to whoever to maintain however they like.

To attach the new step to existing workflows, I used:

$ sed -i 's/^\( *- ref: ipi-conf\)$/\1\n\1-telemetry/' $(git grep -l '^ *- ref: ipi-conf$' ci-operator/step-registry)

to inject the step after ipi-conf in any consumers that were already consuming ipi-conf.

@wking wking force-pushed the disable-telemetry branch 5 times, most recently from 08506e5 to 4ce3005 Compare September 9, 2022 08:50
The OCM team is concerned about the volume of data submitted by
ephemeral CI clusters.  This commit disables Telemetry by default (for
the bulk of CI, which flows through this step), while still allowing
Telemetry to be enabled for jobs that have important data to report,
or jobs that want to excersise the Telemetry or Insights reporting
logic.

From [1]:

> You can modify your existing global cluster pull secret to disable
> remote health reporting. This disables both Telemetry and the
> Insights Operator.

In 4.12, the Insights operator is growing a new configuration
structure that allows disabling Insights [1,2] without affecting
Telemetry.  And it turns out that the monitoring operator has a
similar option to disable Telemetry [3], although the only documented
use for telemeterClient is for setting nodeSelector [4].

There are a number of existing steps poking at the
cluster-monitoring-config ConfigMap, including
ipi-conf-inframachineset setting telemeterClient's nodeSelector, so I
am following ipi-conf-user-workload-monitoring's use of yq patching
and manifest naming to fit in with the other steps.

I'm following 8a4696c (using yq installed from upi-installer image,
2022-08-25, openshift#31256) to get yq from the upi-installer image instead of
curling it down dynamically.  With the 4.12 pin to get `yq`, we're
safe using `upi-installer`, even for CI jobs that are otherwise
looking at different 4.y.

[1]: https://github.com/openshift/enhancements/blob/ef85659d01738b9f89958d5f0da31cff05bb1182/enhancements/insights/insights-config-api.md
[2]: https://docs.openshift.com/container-platform/4.11/support/remote_health_monitoring/opting-out-of-remote-health-reporting.html
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/8d331d78b22948d36c20da0552763ddd8a4e2093/pkg/manifests/config.go#L337
[4]: https://docs.openshift.com/container-platform/4.11/monitoring/configuring-the-monitoring-stack.html#moving-monitoring-components-to-different-nodes_configuring-the-monitoring-stack
Generated with:

  $ sed -i 's/^\( *- ref: ipi-conf\)$/\1\n\1-telemetry/' $(git grep -l '^ *- ref: ipi-conf$' ci-operator/step-registry)

to slot the new step in after the generic ipi-conf, so we can
configure Telemetry in all of the existing chains and workflows that
were flowing through ipi-conf.
When making changes to the Telemeter client, we want to ensure that
we're still excercising uploads.
@wking wking force-pushed the disable-telemetry branch from 4ce3005 to 00b1310 Compare September 9, 2022 08:51
@jianlinliu
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 9, 2022

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/metallb-operator/main/operator-e2e-upgrade be589f4b9f767fdf50fea7dd5d82db58f4b425a9 link unknown /test pj-rehearse
ci/rehearse/periodic-ci-openshift-hypershift-main-periodics-conformance-proxy-aws-ovn-4-12 00b1310 link unknown /test pj-rehearse
ci/rehearse/openshift/cloud-provider-azure/master/openshift-e2e-azure-techpreview-upgrade 00b1310 link unknown /test pj-rehearse
ci/rehearse/openshift/cloud-provider-azure/master/openshift-e2e-azure-ccm-install 00b1310 link unknown /test pj-rehearse
ci/rehearse/openshift/kubernetes/master/e2e-aws-crun 00b1310 link unknown /test pj-rehearse
ci/rehearse/openshift/origin/master/e2e-aws-jenkins 08506e557ca3502c7b276ce9c5ab3ab1f6fa1d89 link unknown /test pj-rehearse
ci/rehearse/openshift/cloud-provider-azure/master/e2e-azure-ccm 00b1310 link unknown /test pj-rehearse
ci/rehearse/openshift/gcp-filestore-csi-driver-operator/main/operator-e2e 00b1310 link unknown /test pj-rehearse
ci/prow/pj-rehearse 00b1310 link false /test pj-rehearse
ci/rehearse/periodic-ci-openshift-hypershift-main-periodics-e2e-conformance-kubevirt 00b1310 link unknown /test pj-rehearse
ci/rehearse/periodic-ci-openshift-hypershift-main-periodics-conformance-azure-ovn-4-12 00b1310 link unknown /test pj-rehearse

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Sep 9, 2022

Auditing some of the rehearsals, openshift-telemeter-master-e2e-aws-upgrade:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-telemeter-master-e2e-aws-upgrade/1568161661800419328/artifacts/e2e-aws-upgrade/ipi-conf-telemetry/build-log.txt
Nothing to do with TELEMETRY_ENABLED='true'
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-telemeter-master-e2e-aws-upgrade/1568161661800419328/artifacts/e2e-aws-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].spec.clusterID'
6f985223-6552-4027-aa0b-ac2b84dd5495

And we see Telemetry from that cluster:

image

Moving to openshift-cloud-provider-azure-master-openshift-e2e-azure-ccm-install:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-cloud-provider-azure-master-openshift-e2e-azure-ccm-install/1568161660386938880/artifacts/openshift-e2e-azure-ccm-install/ipi-conf-telemetry/build-log.txtCreating /tmp/secret/manifest_cluster-monitoring-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: >-
    telemeterClient:

      enabled: false
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/32153/rehearse-32153-pull-ci-openshift-cloud-provider-azure-master-openshift-e2e-azure-ccm-install/1568161660386938880/artifacts/openshift-e2e-azure-ccm-install/gather-extra/artifacts/clusterversion.json | jq -r '.items[].spec.clusterID'
01a01f0d-73d2-4bf0-a465-73f823feaa56

And there's nothing from that cluster:

image

Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, jianlinliu, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2022
@openshift-merge-robot openshift-merge-robot merged commit 3c1da8e into openshift:master Sep 13, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 13, 2022

@wking: Updated the ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:

  • key openshift-telemeter-master.yaml using file ci-operator/config/openshift/telemeter/openshift-telemeter-master.yaml
Details

In response to this:

The OCM team is concerned about the volume of data submitted by ephemeral CI clusters. This commit disables Telemetry by default (for the bulk of CI, which flows through the new step), while still allowing Telemetry to be enabled for jobs that have important data to report, or jobs that want to excersise the Telemetry or Insights reporting logic.

From these docs:

You can modify your existing global cluster pull secret to disable remote health reporting. This disables both Telemetry and the Insights Operator.

In 4.12, the Insights operator is growing a new configuration structure that allows disabling Insights (enhancement, API) without affecting Telemetry. And it turns out that the monitoring operator has a similar option to disable Telemetry, although the only documented use for 1telemeterClient1 is for setting nodeSelector.

There are a number of existing steps poking at the cluster-monitoring-config ConfigMap, including ipi-conf-inframachineset setting telemeterClient's nodeSelector, so I am following ipi-conf-user-workload-monitoring's use of yq patching and manifest naming to fit in with the other steps.

I'm following 8a4696c (#31256) to get yq from the upi-installer image instead of curling it down dynamically. With the 4.12 pin to get yq, we're safe using upi-installer, even for CI jobs that are otherwise looking at different 4.y.

The logic is coming in as a new step, because it feels like touching cluster-monitoring-config should be owned by some monitoring folks. I've added @jan--f as a co-maintainer to that end, but I'm happy to hand this off to whoever to maintain however they like.

To attach the new step to existing workflows, I used:

$ sed -i 's/^\( *- ref: ipi-conf\)$/\1\n\1-telemetry/' $(git grep -l '^ *- ref: ipi-conf$' ci-operator/step-registry)

to inject the step after ipi-conf in any consumers that were already consuming ipi-conf.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the disable-telemetry branch September 13, 2022 15:05
wking added a commit to wking/openshift-release that referenced this pull request Sep 13, 2022
Effectively neutralizing 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift#32153), 2022-09-13), until we teach origin's test case to understand
this disabling:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=6h&type=junit&name=^periodic-&search=Prometheus+when+installed+on+the+cluster+should+report+telemetry+if+a+cloud.openshift.com+token+is+present' | grep 'failures match' | sort
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-ovn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ovn-serial-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ppc64le-powervs (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-serial-aws-arm64 (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-image-ecosystem-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-stable-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.11-e2e-aws-cgroupsv2 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-gcp-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade (all) - 10 runs, 100% failed, 20% of failures match = 20% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.7-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.7-e2e-azure-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.7-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-alibaba (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-single-node-workers (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-azure-sdn-fips-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-ovn-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.9-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-okd-4.12-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact

With failures like [1]:

  : [sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
  Run #0:	1m1s
  {  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:465]: Unexpected error:
      <errors.aggregate | len:2, cap:2>: [
          {
              s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]",
          },
          {
              s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]",
          },
      ]
      [promQL query returned unexpected results:
      metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1
      [], promQL query returned unexpected results:
      federate_samples{job="telemeter-client"} >= 10
      []]
  occurred

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun/1569617920785387520
openshift-merge-robot pushed a commit that referenced this pull request Sep 13, 2022
Effectively neutralizing 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(#32153), 2022-09-13), until we teach origin's test case to understand
this disabling:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=6h&type=junit&name=^periodic-&search=Prometheus+when+installed+on+the+cluster+should+report+telemetry+if+a+cloud.openshift.com+token+is+present' | grep 'failures match' | sort
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-ovn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ovn-serial-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-ppc64le-powervs (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-serial-aws-arm64 (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-image-ecosystem-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-stable-4.10-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-aws-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-azure-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-gcp-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.11-e2e-aws-cgroupsv2 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-e2e-gcp-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade (all) - 10 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade (all) - 10 runs, 100% failed, 20% of failures match = 20% impact
  periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.7-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.7-e2e-azure-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-ci-4.7-e2e-gcp-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-alibaba (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-single-node-workers (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-aws-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-cgroupsv2 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-azure-sdn-fips-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-ovn-techpreview-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.12-e2e-vsphere-sdn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-fips (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-proxy (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-azure (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.7-e2e-vsphere-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-nightly-4.9-e2e-gcp-rt (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  periodic-ci-openshift-release-master-okd-4.12-e2e-aws-ovn (all) - 1 runs, 100% failed, 100% of failures match = 100% impact

With failures like [1]:

  : [sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
  Run #0:	1m1s
  {  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:465]: Unexpected error:
      <errors.aggregate | len:2, cap:2>: [
          {
              s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]",
          },
          {
              s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]",
          },
      ]
      [promQL query returned unexpected results:
      metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1
      [], promQL query returned unexpected results:
      federate_samples{job="telemeter-client"} >= 10
      []]
  occurred

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-crun/1569617920785387520
wking added a commit to wking/openshift-release that referenced this pull request Sep 13, 2022
…Telemetry for e2e-aws

I'd disabled Telemetry for the bulk of the CI fleet in 3c1da8e
(OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable
Telemetry (openshift#32153), 2022-09-13).  But that lead to many failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

so I'd flipped the default to keeping Telemetry enabled in d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(openshift#32249), 2022-09-13).  Now I'm looking to teach the origin test-case
skip about the mechanism I used to disable Telemetry, and I want an
origin master presubmit with Telemetry disabled.  The only
run_if_changed origin master presubmits are e2e-gcp-builds,
e2e-aws-jenkins, e2e-gcp-image-ecosystem, and e2e-aws-image-registry,
and none of those sound like job that will run the test-case I'm
interested in (although maybe they do; I haven't dug in to confirm).
But e2e-aws is optional, so having the presubmit temporarily failing
for other origin master pull requests won't block changes from
landing.

We'll revert this change and return the job to the CI-wide Telemetry
default once we've confirmed that the test-case skips are smart
enough.
wking added a commit to wking/origin that referenced this pull request Sep 13, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
openshift-merge-robot pushed a commit that referenced this pull request Sep 13, 2022
…Telemetry for e2e-aws (#32252)

I'd disabled Telemetry for the bulk of the CI fleet in 3c1da8e
(OTA-740: ci-operator/step-registry/ipi/conf/telemetry: Disable
Telemetry (#32153), 2022-09-13).  But that lead to many failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

so I'd flipped the default to keeping Telemetry enabled in d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(#32249), 2022-09-13).  Now I'm looking to teach the origin test-case
skip about the mechanism I used to disable Telemetry, and I want an
origin master presubmit with Telemetry disabled.  The only
run_if_changed origin master presubmits are e2e-gcp-builds,
e2e-aws-jenkins, e2e-gcp-image-ecosystem, and e2e-aws-image-registry,
and none of those sound like job that will run the test-case I'm
interested in (although maybe they do; I haven't dug in to confirm).
But e2e-aws is optional, so having the presubmit temporarily failing
for other origin master pull requests won't block changes from
landing.

We'll revert this change and return the job to the CI-wide Telemetry
default once we've confirmed that the test-case skips are smart
enough.
wking added a commit to wking/origin that referenced this pull request Sep 13, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 13, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 14, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 14, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 14, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 14, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 15, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 15, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/origin that referenced this pull request Sep 16, 2022
I'd disabled Telemetry for the bulk of the CI fleet in
openshift/release@3c1da8eb20 (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift/release#32153), 2022-09-13).  But that lead to many
failures for:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

This commit extends the checks for Telemetry enablement to include the
monitoring-specific ConfigMap, as well as the previously-checked
pull-secret token.

I'm copy/pasting a subset of the monitoring configuration structure
instead of vendoring the config, because we aren't vendoring the
cluster-monitoring-operator in origin today, bumping to keep such
vendoring up to date would be tedious, and the monitoring config API
is unlikely to shift this knob around within the structure.  It would
be nice if the monitoring config schema moved into opensihft/api or
somewhere else where it would be more clear that it was a
cluster-admin-facing API, with the usual stability commitments, but
it's not there today.
wking added a commit to wking/openshift-release that referenced this pull request Oct 19, 2023
We'd tried this previously in 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift#32153), 2022-09-13), but had to roll it back with d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(openshift#32249), 2022-09-13), to avoid failing:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus:
Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422)
taught that test-case about the config knob this step uses to disable
Telemetry.  Those test-case changes are present in origin test suites
starting in 4.12:

  $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done
  origin/release-4.11:test/extended/prometheus/prometheus.go:             g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() {
  origin/release-4.12:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.13:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.14:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {
  origin/release-4.15:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {

and 4.10 is end-of-life since 2023-09-10 [1].  That leaves tests using
4.11 versions of the origin suite, and I'm addressing those via the
JOB_NAME checks [2,3].  The checks are brittle, leaving out 4.9 and
earlier, and possibly not matching some 4.11 jobs, but they will
hopefully be sufficient to get us through until 4.11 goes end-of-life
on 2024-02-10 [3].  And when the defaulting logic breaks down, jobs
that have an opinion can set TELEMETRY_ENABLED explicitly to match
their needs.

[1]: https://access.redhat.com/support/policy/updates/openshift/#dates
[2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables
[3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
wking added a commit to wking/openshift-release that referenced this pull request Oct 19, 2023
We'd tried this previously in 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(openshift#32153), 2022-09-13), but had to roll it back with d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(openshift#32249), 2022-09-13), to avoid failing:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus:
Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422)
taught that test-case about the config knob this step uses to disable
Telemetry.  Those test-case changes are present in origin test suites
starting in 4.12:

  $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done
  origin/release-4.11:test/extended/prometheus/prometheus.go:             g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() {
  origin/release-4.12:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.13:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.14:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {
  origin/release-4.15:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {

and 4.10 is end-of-life since 2023-09-10 [1].  That leaves tests using
4.11 versions of the origin suite, and I'm addressing those via the
JOB_NAME checks [2,3].  The checks are brittle, leaving out 4.9 and
earlier, and possibly not matching some 4.11 jobs, but they will
hopefully be sufficient to get us through until 4.11 goes end-of-life
on 2024-02-10 [3].  And when the defaulting logic breaks down, jobs
that have an opinion can set TELEMETRY_ENABLED explicitly to match
their needs.

[1]: https://access.redhat.com/support/policy/updates/openshift/#dates
[2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables
[3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables
openshift-ci bot pushed a commit that referenced this pull request Oct 19, 2023
* ci-operator/step-registry/ipi/conf/telemetry: Disable by default (again)

We'd tried this previously in 3c1da8e (OTA-740:
ci-operator/step-registry/ipi/conf/telemetry: Disable Telemetry
(#32153), 2022-09-13), but had to roll it back with d61129c
(ci-operator/step-registry/ipi/conf/telemetry: Restore Telemetry
(#32249), 2022-09-13), to avoid failing:

  Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present

Subsequently, openshift/origin@76652fa4fa (test/extended/prometheus:
Consider telemeterClient.enabled, 2022-09-15, openshift/origin#27422)
taught that test-case about the config knob this step uses to disable
Telemetry.  Those test-case changes are present in origin test suites
starting in 4.12:

  $ for Y in $(seq 11 15); do git --no-pager grep 'should report telemetry' "origin/release-4.${Y}" -- test/extended/prometheus/prometheus.go; done
  origin/release-4.11:test/extended/prometheus/prometheus.go:             g.It("should report telemetry if a cloud.openshift.com token is present [Late]", func() {
  origin/release-4.12:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.13:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Late]", func() {
  origin/release-4.14:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {
  origin/release-4.15:test/extended/prometheus/prometheus.go:             g.It("should report telemetry [Serial] [Late]", func() {

and 4.10 is end-of-life since 2023-09-10 [1].  That leaves tests using
4.11 versions of the origin suite, and I'm addressing those via the
JOB_NAME checks [2,3].  The checks are brittle, leaving out 4.9 and
earlier, and possibly not matching some 4.11 jobs, but they will
hopefully be sufficient to get us through until 4.11 goes end-of-life
on 2024-02-10 [3].  And when the defaulting logic breaks down, jobs
that have an opinion can set TELEMETRY_ENABLED explicitly to match
their needs.

[1]: https://access.redhat.com/support/policy/updates/openshift/#dates
[2]: https://docs.ci.openshift.org/docs/architecture/step-registry/#available-environment-variables
[3]: https://docs.prow.k8s.io/docs/jobs/#job-environment-variables

* Make 4.11 regex pass shellcheck

---------

Co-authored-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants