Skip to content

Introduce metrics.Registry to pass down registries#61239

Merged
hugoShaka merged 6 commits intomasterfrom
hugo/propagate-metric-namespace-and-subsystem
Nov 12, 2025
Merged

Introduce metrics.Registry to pass down registries#61239
hugoShaka merged 6 commits intomasterfrom
hugo/propagate-metric-namespace-and-subsystem

Conversation

@hugoShaka
Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka commented Nov 11, 2025

When working on plugin metrics I realized there are several metrics challenges in teleport:

  • not all teleport binaries use the teleport_ namespace (tbot, teleport plugins, ...)
  • the same generic component can be used in several places and need to expose separate metrics (e.g. reconciler)
  • not all metrics are always defined: plugins must have non-global metrics as they can be turned on and off, and exist in multiple instances

This PR introduces a new metrics.Registry type that can be passed like a logger and contains information about:

  • the namespace
  • the subsystem
  • where metrics should be registered

Putting everything into the same structs avoids cumbersome multiple argument while being able to properly nest metrics.

@hugoShaka hugoShaka force-pushed the hugo/propagate-metric-namespace-and-subsystem branch from ce1e0ad to d3b1564 Compare November 11, 2025 18:58
@github-actions github-actions bot requested review from gzdunek and r0mant November 11, 2025 18:58
@hugoShaka hugoShaka force-pushed the hugo/propagate-metric-namespace-and-subsystem branch from d3b1564 to 17a82fd Compare November 11, 2025 18:58
@hugoShaka hugoShaka force-pushed the hugo/propagate-metric-namespace-and-subsystem branch from 17a82fd to 10a757e Compare November 11, 2025 18:59
Comment on lines +33 to +37
type Registry struct {
prometheus.Registerer

namespace string
subsystem string
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can easily add a debug prometheus.Registerer later, so the scale team can expose tctl debug-only super high cardinality metrics.

@hugoShaka hugoShaka added the no-changelog Indicates that a PR does not require a changelog entry label Nov 11, 2025
@hugoShaka hugoShaka changed the title Introduce metrics.Registry and use it Introduce metrics.Registry to pass down registries Nov 11, 2025
hugoShaka and others added 2 commits November 11, 2025 15:42
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
@hugoShaka
Copy link
Copy Markdown
Contributor Author

Must be merged after https://github.com/gravitational/teleport.e/pull/7557 to avoid breaking e builds

@hugoShaka hugoShaka enabled auto-merge November 12, 2025 15:48
@hugoShaka hugoShaka added this pull request to the merge queue Nov 12, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 12, 2025
@hugoShaka hugoShaka added this pull request to the merge queue Nov 12, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 12, 2025
@hugoShaka hugoShaka added this pull request to the merge queue Nov 12, 2025
Merged via the queue into master with commit 8d94614 Nov 12, 2025
41 checks passed
@hugoShaka hugoShaka deleted the hugo/propagate-metric-namespace-and-subsystem branch November 12, 2025 22:00
Copy link
Copy Markdown
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by from the prometheus RFD

newReg := &Registry{
Registerer: r.Registerer,
namespace: r.namespace,
subsystem: r.subsystem + "_" + subsystem,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this handle an empty r.subsystem?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hugoShaka added a commit that referenced this pull request Nov 21, 2025
* Introduce metrics.Registry and use it

* Update lib/metrics/registry.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* BlackHole -> BlackHoleRegistry

* merge lib/metrics and lib/observability/metrics

* lint

* address noah's feedback

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Nov 24, 2025
* Add entra ID metrics (#60537)

* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint

* add optional reconciler metrics (#60581)

* expose TeleportProcess metrics registry (#60654)

* test setting a non-nil registry in config

* expose teleport process metric registry

* remove metric config

* fixup! remove metric config

* Add support in process for additional metrics gatherers (#60852)

* Add support in process for additional metrics gatherers

Before this change, we were gathering from 2 metrics gatherers:
- the process registry
- the global registry

There are cases where we must add and remove metrics (e.g. plugins).
We could throw them into the global registry but:
- this would pollute the global registry and cause duplicates/conflicts
  in tests
- this would conflate all metrics from the same plugin kind. We support
  several instances of the same hosted plugin and we might want to
  keep distinct metrics.

This change makes the gatherers a list, and add a function so teleport.e
can add its own gatherer. A teleport.e PR using this mechanism will
follow.

* Protect gatherer slice with a mutex

* Fix the generic reconciler metric API (#60853)

When implementing reconciler metrics in #60581
I did not realize some GenericReconciler usage, including the one I
wanted to observe, were short-lived. The implementation had 2 blatant
issues:
- metrics were lost for each invocations
- creating a new reonciler would attempt to register the metric a second
  time and cause a conflict

This PR changes the reconciler metrics API so the caller is responsible
for creating and registering the metrics beforehand. This allows the
caller to create the metric struct once and pass them to successive
`NewGenericReconciler` calls.

* Introduce metrics.Registry to pass down registries (#61239)

* Introduce metrics.Registry and use it

* Update lib/metrics/registry.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* BlackHole -> BlackHoleRegistry

* merge lib/metrics and lib/observability/metrics

* lint

* address noah's feedback

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* metrics.Registry.Wrap() handle empty subsystems properly (#61392)

* handle empty subsystems properly

* appeasing our italian engineering team

* Fix build after rebase

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a PR does not require a changelog entry size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants