Conversation
This commit adds metrics for entra ID sync. This is the OSS part, it contains the msgraph client metrics. As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry.
| return &clientMetrics{ | ||
| requestTotal: prometheus.NewCounterVec(prometheus.CounterOpts{ | ||
| Namespace: teleport.MetricNamespace, | ||
| Subsystem: metricSubsystem, | ||
| Name: "request_total", | ||
| Help: "Total number of requests made to MS Graph", | ||
| }, []string{metricsLabelsMethod, metricsLabelStatus}), | ||
| requestDuration: prometheus.NewHistogramVec(prometheus.HistogramOpts{ | ||
| Namespace: teleport.MetricNamespace, | ||
| Subsystem: metricSubsystem, | ||
| Name: "request_duration_seconds", | ||
| Help: "Request to MS Graph duration in seconds.", | ||
| }, []string{metricsLabelsMethod}), | ||
| } |
There was a problem hiding this comment.
This will not scale in long run - we will keep implementing third party api SDK metrics for each external client. So each SDK will have custom metrics implemented in different way - But it it better than not having metric at all.
If we have a moment I would like to consider building generic third party API metrics in generic way:
But I don't see this as a blocker for this PR, but if possible we can wrap up #40579 and have a generic metrics arch shared across very SDK client.
WDYT ?
There was a problem hiding this comment.
I considered doing this but this raises the problem of cardinality. We need some way to mask variable parts of the URI. We have a similar problem in the backend with our metrics and we have not solved it yet. I wanted to get metrics without refactoring the whole client to add a new endpoint struct containing both the endpoint parts and the metrics mask
There was a problem hiding this comment.
My vote would be to start simple and iterate. The generic approach sounds nice at first, but adds a number of complexities that we don't need to solve today.
rosstimothy
left a comment
There was a problem hiding this comment.
We should also consider a follow up that uses otelhttp so that these requests are traced.
|
@hugoShaka See the table below for backport results.
|
* Add entra ID metrics This commit adds metrics for entra ID sync. This is the OSS part, it contains the msgraph client metrics. As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry. * lint
* Add entra ID metrics This commit adds metrics for entra ID sync. This is the OSS part, it contains the msgraph client metrics. As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry. * lint
* Add entra ID metrics This commit adds metrics for entra ID sync. This is the OSS part, it contains the msgraph client metrics. As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry. * lint
* Add entra ID metrics This commit adds metrics for entra ID sync. This is the OSS part, it contains the msgraph client metrics. As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry. * lint
* Add entra ID metrics (#60537) * Add entra ID metrics This commit adds metrics for entra ID sync. This is the OSS part, it contains the msgraph client metrics. As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry. * lint * add optional reconciler metrics (#60581) * expose TeleportProcess metrics registry (#60654) * test setting a non-nil registry in config * expose teleport process metric registry * remove metric config * fixup! remove metric config * Add support in process for additional metrics gatherers (#60852) * Add support in process for additional metrics gatherers Before this change, we were gathering from 2 metrics gatherers: - the process registry - the global registry There are cases where we must add and remove metrics (e.g. plugins). We could throw them into the global registry but: - this would pollute the global registry and cause duplicates/conflicts in tests - this would conflate all metrics from the same plugin kind. We support several instances of the same hosted plugin and we might want to keep distinct metrics. This change makes the gatherers a list, and add a function so teleport.e can add its own gatherer. A teleport.e PR using this mechanism will follow. * Protect gatherer slice with a mutex * Fix the generic reconciler metric API (#60853) When implementing reconciler metrics in #60581 I did not realize some GenericReconciler usage, including the one I wanted to observe, were short-lived. The implementation had 2 blatant issues: - metrics were lost for each invocations - creating a new reonciler would attempt to register the metric a second time and cause a conflict This PR changes the reconciler metrics API so the caller is responsible for creating and registering the metrics beforehand. This allows the caller to create the metric struct once and pass them to successive `NewGenericReconciler` calls. * Introduce metrics.Registry to pass down registries (#61239) * Introduce metrics.Registry and use it * Update lib/metrics/registry.go Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * BlackHole -> BlackHoleRegistry * merge lib/metrics and lib/observability/metrics * lint * address noah's feedback --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * metrics.Registry.Wrap() handle empty subsystems properly (#61392) * handle empty subsystems properly * appeasing our italian engineering team * Fix build after rebase --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
This PR adds metric support for entra ID sync. This is the OSS part, it contains the msgraph client metrics.
As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry.