Skip to content

Add entra ID metrics#60537

Merged
hugoShaka merged 2 commits intomasterfrom
hugo/add-entra-sync-metrics
Oct 24, 2025
Merged

Add entra ID metrics#60537
hugoShaka merged 2 commits intomasterfrom
hugo/add-entra-sync-metrics

Conversation

@hugoShaka
Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka commented Oct 23, 2025

This PR adds metric support for entra ID sync. This is the OSS part, it contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and might not have access to a metric registerer yet, the client gracefully handles not being given a metric registry. In this case it won't register its metrics, we don't want to continue polluting the global metrics registry.

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.
@hugoShaka hugoShaka added no-changelog Indicates that a PR does not require a changelog entry backport/branch/v18 labels Oct 23, 2025
Comment on lines +43 to +56
return &clientMetrics{
requestTotal: prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: teleport.MetricNamespace,
Subsystem: metricSubsystem,
Name: "request_total",
Help: "Total number of requests made to MS Graph",
}, []string{metricsLabelsMethod, metricsLabelStatus}),
requestDuration: prometheus.NewHistogramVec(prometheus.HistogramOpts{
Namespace: teleport.MetricNamespace,
Subsystem: metricSubsystem,
Name: "request_duration_seconds",
Help: "Request to MS Graph duration in seconds.",
}, []string{metricsLabelsMethod}),
}
Copy link
Copy Markdown
Contributor

@smallinsky smallinsky Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not scale in long run - we will keep implementing third party api SDK metrics for each external client. So each SDK will have custom metrics implemented in different way - But it it better than not having metric at all.

If we have a moment I would like to consider building generic third party API metrics in generic way:

#40579

But I don't see this as a blocker for this PR, but if possible we can wrap up #40579 and have a generic metrics arch shared across very SDK client.

WDYT ?

Copy link
Copy Markdown
Contributor Author

@hugoShaka hugoShaka Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered doing this but this raises the problem of cardinality. We need some way to mask variable parts of the URI. We have a similar problem in the backend with our metrics and we have not solved it yet. I wanted to get metrics without refactoring the whole client to add a new endpoint struct containing both the endpoint parts and the metrics mask

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be to start simple and iterate. The generic approach sounds nice at first, but adds a number of complexities that we don't need to solve today.

Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider a follow up that uses otelhttp so that these requests are traced.

@hugoShaka hugoShaka added this pull request to the merge queue Oct 24, 2025
Merged via the queue into master with commit adb674b Oct 24, 2025
41 checks passed
@hugoShaka hugoShaka deleted the hugo/add-entra-sync-metrics branch October 24, 2025 15:47
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@hugoShaka See the table below for backport results.

Branch Result
branch/v18 Failed

mmcallister pushed a commit that referenced this pull request Nov 6, 2025
* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint
mmcallister pushed a commit that referenced this pull request Nov 19, 2025
* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint
mmcallister pushed a commit that referenced this pull request Nov 20, 2025
* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint
hugoShaka added a commit that referenced this pull request Nov 21, 2025
* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint
github-merge-queue bot pushed a commit that referenced this pull request Nov 24, 2025
* Add entra ID metrics (#60537)

* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint

* add optional reconciler metrics (#60581)

* expose TeleportProcess metrics registry (#60654)

* test setting a non-nil registry in config

* expose teleport process metric registry

* remove metric config

* fixup! remove metric config

* Add support in process for additional metrics gatherers (#60852)

* Add support in process for additional metrics gatherers

Before this change, we were gathering from 2 metrics gatherers:
- the process registry
- the global registry

There are cases where we must add and remove metrics (e.g. plugins).
We could throw them into the global registry but:
- this would pollute the global registry and cause duplicates/conflicts
  in tests
- this would conflate all metrics from the same plugin kind. We support
  several instances of the same hosted plugin and we might want to
  keep distinct metrics.

This change makes the gatherers a list, and add a function so teleport.e
can add its own gatherer. A teleport.e PR using this mechanism will
follow.

* Protect gatherer slice with a mutex

* Fix the generic reconciler metric API (#60853)

When implementing reconciler metrics in #60581
I did not realize some GenericReconciler usage, including the one I
wanted to observe, were short-lived. The implementation had 2 blatant
issues:
- metrics were lost for each invocations
- creating a new reonciler would attempt to register the metric a second
  time and cause a conflict

This PR changes the reconciler metrics API so the caller is responsible
for creating and registering the metrics beforehand. This allows the
caller to create the metric struct once and pass them to successive
`NewGenericReconciler` calls.

* Introduce metrics.Registry to pass down registries (#61239)

* Introduce metrics.Registry and use it

* Update lib/metrics/registry.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* BlackHole -> BlackHoleRegistry

* merge lib/metrics and lib/observability/metrics

* lint

* address noah's feedback

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* metrics.Registry.Wrap() handle empty subsystems properly (#61392)

* handle empty subsystems properly

* appeasing our italian engineering team

* Fix build after rebase

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/branch/v18 no-changelog Indicates that a PR does not require a changelog entry size/sm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants