Fix migration metric registration#620
Conversation
Don't register process_start_time_seconds metric in migration metrics manager to prevent double registration, resulting in this error: gathered metric family process_start_time_seconds has help "[ALPHA] Start time of the process since unix epoch in seconds." but should have "Start time of the process since unix epoch in seconds."
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jsafrane: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest |
| metricsManager = metrics.NewCSIMetricsManagerWithOptions(provisionerName, metrics.WithMigration()) | ||
| metricsManager = metrics.NewCSIMetricsManagerWithOptions(provisionerName, | ||
| // Will be provided via default gatherer. | ||
| metrics.WithProcessStartTime(false), |
There was a problem hiding this comment.
Should we remove the processstarttime metric from csi-lib-utils now that it's included by default?
There was a problem hiding this comment.
Some sidecars only use the csi-lib-utils code and nothing else, in which case processtarttime still needs to be added there.
There was a problem hiding this comment.
I thought the plan was to also add the base metrics to all sidecars?
There was a problem hiding this comment.
Yes, but that's just a plan. I don't know who's working on it. We can only remove it once all sidecars are updated.
|
Should this fix be cherry-picked? |
Right now it would be simpler to release 2.2.1 from master (all it has are the two bug fixes that need to be cherry-picked) and then fast-forward |
|
Is there a way to catch this from happening? We had some metric e2e tests - https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/volume_metrics.go . I don't think we ported these to CSI suite. |
That looks like it tests metrics data in kube-controller-manager. What we want to test are metrics exposed by the sidecars. Retrieving those from an e2e.test isn't easy because from outside of the cluster, port-forwarding has to be used to reach the metrics endpoint of each container. But I found a solution: https://github.com/intel/pmem-csi/blob/devel/test/e2e/pod/dial.go Simply retrieving the metrics data would already be useful. But ideally we should also validate that the returned data meets expectations, which means we need to define what those expectations are. But whether that would have caught this issue here is still uncertain. Do we have e2e tests involving a driver where migration is enabled? |
/kind bug
What this PR does / why we need it:
Don't register
process_start_time_seconds metricin migration metrics manager to prevent double registration, resulting in this error:This happens only when a migratable CSI driver is used. With a regular hostpath CSI driver, the metrics work without any issues.
Which issue(s) this PR fixes:
Fixes #619
Special notes for your reviewer:
cc @pohly @Jiawei0227
Does this PR introduce a user-facing change?: