Skip to content

[8.19] [APM][Infra] Fix OTel metrics mapping in infrastructure tab (#259552)#261020

Merged
rmyz merged 2 commits intoelastic:8.19from
rmyz:backport/8.19/pr-259552
Apr 3, 2026
Merged

[8.19] [APM][Infra] Fix OTel metrics mapping in infrastructure tab (#259552)#261020
rmyz merged 2 commits intoelastic:8.19from
rmyz:backport/8.19/pr-259552

Conversation

@rmyz
Copy link
Copy Markdown
Contributor

@rmyz rmyz commented Apr 2, 2026

Backport

This will backport the following commits from main to 8.19:

Questions ?

Please refer to the Backport tool documentation

…259552)

## Summary

Closes elastic#256731

Fix OTel metrics in the APM Infrastructure tab so hosts, pods, and
containers display actual values instead of `N/A`. The root causes were:
(1) hosts queried `metrics.system.memory.limit`, a field that doesn't
exist in `hostmetricsreceiver` data, (2) pod and container configs
queried `_limit_utilization` fields that only exist when Kubernetes
resource limits are explicitly set — which most deployments don't have,
and (3) all OTel dataset filters only matched `event.dataset`, missing
documents indexed under `data_stream.dataset`.

## Demo

### Before

https://github.com/user-attachments/assets/63193175-7893-47fa-8a82-ff76924908fb

### After

https://github.com/user-attachments/assets/440676f7-4168-4c13-9227-1b3b6bc74e57

## Problem

For OTel entities, the Infrastructure tab in APM showed `N/A` for most
metrics even when semconv data was present in Elasticsearch:

- **Hosts**: The `metrics.system.memory.limit` field doesn't exist in
`hostmetricsreceiver` data, causing Memory Total to show `N/A`.
- **Pods**: CPU used `metrics.k8s.pod.cpu_limit_utilization` (requires
CPU limits) and memory used `metrics.k8s.pod.memory_limit_utilization`
(requires memory limits). Both return empty results when limits aren't
set.
- **Containers (K8s)**: CPU used
`metrics.k8s.container.cpu_limit_utilization` and memory used
`metrics.k8s.container.memory_limit_utilization` — same limits-only
problem.
- **Dataset filters**: All OTel paths only matched `event.dataset`,
missing documents indexed under `data_stream.dataset`.

## Field mapping changes

### Hosts

| Metric | Before | After | Why |
|---|---|---|---|
| Memory total | `metrics.system.memory.limit` | Derived:
`metrics.system.memory.usage / metrics.system.memory.utilization` |
`memory.limit` doesn't exist in hostmetricsreceiver; total is derived
from usage and utilization ratio |
| Dataset filter | `event.dataset: "hostmetricsreceiver.otel"` |
`(data_stream.dataset: "hostmetricsreceiver.otel" OR event.dataset:
"hostmetricsreceiver.otel")` | Match both field locations |

### Pods

| Metric | Before | After | Why |
|---|---|---|---|
| CPU | `metrics.k8s.pod.cpu_limit_utilization` |
`metrics.k8s.pod.cpu.node.utilization` | `cpu_limit_utilization`
requires resource limits; `cpu.node.utilization` is always emitted by
kubeletstats |
| Memory | `metrics.k8s.pod.memory_limit_utilization` |
`metrics.k8s.pod.memory_limit_utilization` with fallback to
`metrics.k8s.pod.memory.working_set` | Queries both; prefers
`memory_limit_utilization` (shown as %) when available, falls back to
`memory.working_set` (shown as MB) to avoid N/A |
| Dataset filter | `event.dataset: "kubeletstatsreceiver.otel"` |
`(data_stream.dataset: "kubeletstatsreceiver.otel" OR event.dataset:
"kubeletstatsreceiver.otel")` | Match both field locations |

### Containers (K8s path)

| Metric | Before | After | Why |
|---|---|---|---|
| CPU | `metrics.k8s.container.cpu_limit_utilization` |
`metrics.container.cpu.usage` | `cpu_limit_utilization` requires
resource limits; `container.cpu.usage` is always emitted by kubeletstats
(0–1 ratio of one CPU core) |
| Memory | `metrics.k8s.container.memory_limit_utilization` |
`metrics.container.memory.working_set` | `memory_limit_utilization`
requires resource limits; `memory.working_set` (bytes → MB) is always
available |
| Memory unit | Always `%` for OTel | `MB` for K8s containers, `%` for
Docker containers | K8s path now uses `working_set` (bytes) not a
percentage |
| Dataset filter | `event.dataset: "kubeletstatsreceiver.otel"` |
`(data_stream.dataset: "kubeletstatsreceiver.otel" OR event.dataset:
"kubeletstatsreceiver.otel")` | Match both field locations |

### Containers (Docker path)

| Metric | Before | After | Why |
|---|---|---|---|
| Dataset filter | `event.dataset: "dockerstatsreceiver.otel"` |
`(data_stream.dataset: "dockerstatsreceiver.otel" OR event.dataset:
"dockerstatsreceiver.otel")` | Match both field locations |

## Other changes

- **Pod memory tooltip**: Added an `EuiIconTip` explaining the fallback
logic (prefers `memory_limit_utilization` as %, falls back to
`memory.working_set` as MB).
- **Pod CPU tooltip removed**: The old tooltip warned that
`cpu_limit_utilization` was optional. The new field
(`cpu.node.utilization`) is always present, making the tooltip
misleading.
- **OTel dataset filter helper**: Extracted `otelDatasetFilter()`
utility to avoid duplicating the `(data_stream.dataset OR
event.dataset)` pattern.
- **Host OTel unpack path**: Added `metricByFieldOtel` /
`unpackMetricOtel` so the host table correctly reads OTel metric
positions instead of falling through to ECS metric keys.

## Test plan

- [x] `yarn test:jest
x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/`
— all 46 tests pass
- [x] Manual smoke in APM Infrastructure tab with OTel service data
(e.g. `kbn-otel-demo` with EDOT Collector):
  - [ ] Hosts tab: CPU count, CPU %, Memory total, Memory % all populate
- [x] Pods tab: CPU % populates; Memory shows % (with limits) or MB
(without limits)
  - [x] Containers tab: CPU % and Memory MB populate for K8s containers

(cherry picked from commit c6485d7)

# Conflicts:
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/container/container_metrics_configs.test.ts
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/container/container_metrics_configs.ts
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/container/use_container_metrics_table.test.ts
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/host/use_host_metrics_table.test.ts
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/host/use_host_metrics_table.ts
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/pod/use_pod_metrics_table.test.ts
#	x-pack/solutions/observability/plugins/metrics_data_access/public/components/infrastructure_node_metrics_tables/pod/use_pod_metrics_table.ts
@rmyz rmyz added the backport This PR is a backport of another PR label Apr 2, 2026
@rmyz rmyz enabled auto-merge (squash) April 2, 2026 20:18
@rmyz rmyz added the backport This PR is a backport of another PR label Apr 2, 2026
@rmyz rmyz requested a review from kibanamachine as a code owner April 2, 2026 20:18
@rmyz rmyz self-assigned this Apr 2, 2026
@rmyz rmyz merged commit 77b1e53 into elastic:8.19 Apr 3, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants