Skip to content

Commit

Permalink
node: topologymgr: Add metric to measure latency
Browse files Browse the repository at this point in the history
We need to determine the latency this feature adds due to
the resource alignment logic executed at pod admission time.
Since such a metric does not exist, a new metric:
`topology_manager_admission_duration_seconds` would be added
in the dev phase.

Signed-off-by: Swati Sehgal <[email protected]>
  • Loading branch information
swatisehgal committed Feb 7, 2023
1 parent f821bbc commit 82138a4
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 5 deletions.
12 changes: 7 additions & 5 deletions keps/sig-node/693-topology-manager/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -797,30 +797,32 @@ Monitor the following metrics:

"topology_manager_admission_requests_total"
"topology_manager_admission_errors_total"
"topology_manager_admission_duration_seconds"

###### How can an operator determine if the feature is in use by workloads?

The operator can look at `topology_manager_admission_requests_total` and `topology_manager_admission_errors_total`
metrics to determine if topology manager is performing its admission check.
The operator can look at `topology_manager_admission_requests_total`, `topology_manager_admission_errors_total` and
`topology_manager_admission_duration_seconds` metrics to determine if topology manager is performing its admission check.
In addition to that, kubelet configuration of the nodes can be inspected to check feature gates and the policies
configured.

###### How can someone using this feature know that it is working for their instance?

- [X] Other (treat as last resort)
- Details: check the kubelet metric `topology_manager_admission_requests_total`
- Details: check the kubelet metric `topology_manager_admission_requests_total` or "topology_manager_admission_duration_seconds"

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

"topology_manager_admission_requests_total" can be used to determine if topology manager is
performing its admission check.
"topology_manager_admission_duration_seconds" (which will be added as this release) can be used to determine
if the resource alignment logic performed at pod admission time is taking longer than expected.

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

- [X] Metrics
- Metric name:
- topology_manager_admission_requests_total
- topology_manager_admission_errors_total
- topology_manager_admission_duration_seconds

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

Expand Down
1 change: 1 addition & 0 deletions keps/sig-node/693-topology-manager/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ disable-supported: true
metrics:
- topology_manager_admission_requests_total
- topology_manager_admission_errors_total
- topology_manager_admission_duration_seconds

0 comments on commit 82138a4

Please sign in to comment.