Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ attributes:
| `informers_sync_timeout`<br>`OTEL_EBPF_KUBE_INFORMERS_SYNC_TIMEOUT` | Maximum time to wait for Kubernetes metadata before starting. For more information, refer to the [informers sync timeout section](#informers-sync-timeout). | Duration | 30s |
| `reconnect_initial_interval`<br>`OTEL_EBPF_KUBE_RECONNECT_INITIAL_INTERVAL` | Initial delay before reconnecting to the Kubernetes API after connection loss. For more information, refer to the [reconnect initial interval section](#reconnect-initial-interval). | Duration | 5s |
| `informers_resync_period`<br>`OTEL_EBPF_KUBE_INFORMERS_RESYNC_PERIOD` | Periodically resynchronize all Kubernetes metadata. For more information, refer to the [informers resynchronization period section](#informers-resynchronization-period). | Duration | 30m |
| `meta_cache_address`<br>`OTEL_EBPF_KUBE_META_CACHE_ADDRESS` | Address of an external `k8s-cache` service to fetch Kubernetes metadata from. For more information, refer to the [meta cache address section](#meta-cache-address). | string | (empty) |
| `service_name_template`<br>`OTEL_EBPF_SERVICE_NAME_TEMPLATE` | Go template for service names. For more information, refer to the [service name template section](#service-name-template). | string | (empty) |

### Enable Kubernetes
Expand Down Expand Up @@ -418,6 +419,13 @@ OBI immediately receives any update on resources' metadata. In addition, OBI
periodically resynchronizes all Kubernetes metadata at the frequency you specify
with this property. Higher values reduce the load on the Kubernetes API service.

### Meta cache address

When set, OBI fetches Kubernetes metadata from an external `k8s-cache` service
over gRPC instead of running its own informers against the Kubernetes API
server. This is recommended on large clusters and DaemonSet deployments to avoid
overloading the Kubernetes API.

### Service name template

You can template service names using Go templates. This lets you create
Expand Down
34 changes: 34 additions & 0 deletions content/en/docs/zero-code/obi/setup/kubernetes-helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Contents:
- [Deploying OBI from helm](#deploying-obi-from-helm)
- [Configuring OBI](#configuring-obi)
- [Configuring OBI metadata](#configuring-obi-metadata)
- [Centralizing Kubernetes metadata with k8s-cache](#centralizing-kubernetes-metadata-with-k8s-cache)
- [Providing secrets to the Helm configuration](#providing-secrets-to-the-helm-configuration)
<!-- TOC -->

Expand Down Expand Up @@ -108,6 +109,39 @@ cluster roles, security contexts, etc. The
[OBI Helm chart documentation](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-ebpf-instrumentation)
describes the diverse configuration options.

## Centralizing Kubernetes metadata with k8s-cache

By default each OBI Pod opens its own connections to the Kubernetes API server
to watch Pod, Node, and Service metadata, not only from its local node, but from
the entire K8s cluster. This is done in order to enrich not only the source of
the request, but the destination info (for example, getting the service name for
an outbound HTTP request to add
[peer](/docs/specs/semconv/registry/attributes/service/#service-attributes-for-peer-services)
attributes, or for
[service graph](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/servicegraphconnector)
metric destination). Querying the full K8s cluster metadata on large clusters
from each OBI pod can overload the API server and affect the whole cluster.

To avoid that, the OBI Helm chart can deploy a small companion service called
`k8s-cache`. The cache watches the Kubernetes API once on behalf of all OBI Pods
and streams metadata to them over gRPC, which removes OBI's per-Pod informer
traffic to the API server and substantially reduces API load.

To enable it, set `k8sCache.replicas` to a non-zero value in your
`helm-obi.yml`:

```yaml
k8sCache:
replicas: 1
```

A single replica is usually enough. For high availability or very large
clusters, increase the replica count — OBI Pods will load-balance across them
through the cache `Service` and reconnect to a healthy replica on failure.

When `k8sCache.replicas` is `0` (the default), the cache is not deployed and
each OBI Pod uses its own local informers.

## Providing secrets to the Helm configuration

If you are submitting directly the metrics and traces to your observability
Expand Down
176 changes: 134 additions & 42 deletions content/en/docs/zero-code/obi/setup/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -358,50 +358,50 @@ spec:
serviceAccount: obi
hostPID: true # <-- Important. Required in Daemonset mode so OBI can discover all monitored processes
containers:
- name: obi
terminationMessagePolicy: FallbackToLogsOnError
image: otel/ebpf-instrument:main
env:
- name: OTEL_EBPF_TRACE_PRINTER
value: "text"
- name: OTEL_EBPF_KUBE_METADATA_ENABLE
value: "autodetect"
- name: KUBE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
...
securityContext:
runAsUser: 0
readOnlyRootFilesystem: true
capabilities:
add:
- BPF # <-- Important. Required for most eBPF probes to function correctly.
- SYS_PTRACE # <-- Important. Allows OBI to access the container namespaces and inspect executables.
- NET_RAW # <-- Important. Allows OBI to use socket filters for http requests.
- CHECKPOINT_RESTORE # <-- Important. Allows OBI to open ELF files.
- DAC_READ_SEARCH # <-- Important. Allows OBI to open ELF files.
- PERFMON # <-- Important. Allows OBI to load BPF programs.
#- SYS_RESOURCE # <-- pre 5.11 only. Allows OBI to increase the amount of locked memory.
#- SYS_ADMIN # <-- Required for Go application trace context propagation, or if kernel.perf_event_paranoid >= 3 on Debian distributions.
drop:
- ALL
volumeMounts:
- name: var-run-obi
mountPath: /var/run/obi
- name: cgroup
mountPath: /sys/fs/cgroup
- name: obi
terminationMessagePolicy: FallbackToLogsOnError
image: otel/ebpf-instrument:main
env:
- name: OTEL_EBPF_TRACE_PRINTER
value: "text"
- name: OTEL_EBPF_KUBE_METADATA_ENABLE
value: "autodetect"
- name: KUBE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
...
securityContext:
runAsUser: 0
readOnlyRootFilesystem: true
capabilities:
add:
- BPF # <-- Important. Required for most eBPF probes to function correctly.
- SYS_PTRACE # <-- Important. Allows OBI to access the container namespaces and inspect executables.
- NET_RAW # <-- Important. Allows OBI to use socket filters for http requests.
- CHECKPOINT_RESTORE # <-- Important. Allows OBI to open ELF files.
- DAC_READ_SEARCH # <-- Important. Allows OBI to open ELF files.
- PERFMON # <-- Important. Allows OBI to load BPF programs.
#- SYS_RESOURCE # <-- pre 5.11 only. Allows OBI to increase the amount of locked memory.
#- SYS_ADMIN # <-- Required for Go application trace context propagation, or if kernel.perf_event_paranoid >= 3 on Debian distributions.
drop:
- ALL
volumeMounts:
- name: var-run-obi
mountPath: /var/run/obi
- name: cgroup
mountPath: /sys/fs/cgroup
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
volumes:
- name: var-run-obi
emptyDir: {}
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: var-run-obi
emptyDir: { }
- name: cgroup
hostPath:
path: /sys/fs/cgroup
---
apiVersion: apps/v1
kind: Deployment
Expand All @@ -412,6 +412,98 @@ metadata:
---
```

## Centralizing Kubernetes metadata with k8s-cache

When OBI runs as a DaemonSet, every OBI Pod opens its own `list` and `watch`
connections against the Kubernetes API server to fetch the metadata it needs to
decorate metrics and traces, not only the local node metadata, but metadata from
the entire cluster. This is done to enrich information outside of the local
node, for example to add
[peer](/docs/specs/semconv/registry/attributes/service/#service-attributes-for-peer-services)
attributes to spans making requests between nodes on the cluster. On large
clusters this fan-out can put significant load on the API server, to the point
where it can affect the whole cluster.

To avoid that, OBI ships an optional companion service called `k8s-cache`. It
runs as a small Deployment, watches the Kubernetes API once on behalf of every
OBI Pod, and streams the metadata to OBI instances over gRPC. This removes OBI's
per-Pod informer traffic to the API server and greatly reduces API load, though
OBI may still perform limited direct Kubernetes API lookups for node and cluster
metadata.

Use of `k8s-cache` is always recommended, but especially if:

- You run OBI as a DaemonSet on a large cluster.
- You run many OBI replicas (large `Deployment`, multiple sidecars, etc.) on the
same cluster.
- The Kubernetes API server is under pressure or rate-limited.

If you do not configure a cache address, each OBI instance keeps its own local
in-process informers, which is fine for small clusters.

`k8s-cache` is only relevant when running OBI on Kubernetes; it has no effect in
the standalone or Docker setups.

To use the cache, deploy it and point OBI at its `Service` address with the
`OTEL_EBPF_KUBE_META_CACHE_ADDRESS` environment variable (or
`attributes.kubernetes.meta_cache_address` in YAML). The easiest way is to use
the OBI Helm chart, which sets up the `Deployment`, `Service`, and OBI wiring
for you when you set `k8sCache.replicas` to a non-zero value.

If you prefer to deploy it manually, the cache is published as the
`ghcr.io/open-telemetry/opentelemetry-ebpf-instrumentation/opentelemetry-ebpf-k8s-cache`
container image. A minimal manifest looks like:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: k8s-cache
spec:
replicas: 1
selector:
matchLabels:
app: k8s-cache
template:
metadata:
labels:
app: k8s-cache
spec:
serviceAccountName: obi # needs list/watch on pods, nodes, services
containers:
- name: k8s-cache
image: ghcr.io/open-telemetry/opentelemetry-ebpf-instrumentation/opentelemetry-ebpf-k8s-cache:latest
ports:
- containerPort: 50055
name: grpc
---
apiVersion: v1
kind: Service
metadata:
name: k8s-cache
spec:
selector:
app: k8s-cache
ports:
- port: 50055
name: grpc
protocol: TCP
```

Then point OBI at it from the DaemonSet:

```yaml
env:
- name: OTEL_EBPF_KUBE_METADATA_ENABLE
value: 'true'
- name: OTEL_EBPF_KUBE_META_CACHE_ADDRESS
value: 'k8s-cache.default.svc:50055'
```

A single replica is usually enough. For high availability, run multiple replicas
behind the same `Service` — each OBI Pod connects to one and reconnects to
another on failure.

## Providing an external configuration file

In the previous examples, OBI was configured via environment variables. However,
Expand Down
4 changes: 4 additions & 0 deletions static/refcache.json
Original file line number Diff line number Diff line change
Expand Up @@ -21479,6 +21479,10 @@
"StatusCode": 206,
"LastSeen": "2026-05-05T10:22:54.45528098Z"
},
"https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-attributes-for-peer-services": {
"StatusCode": 206,
"LastSeen": "2026-04-28T11:12:12.096928+03:00"
},
"https://opentracing.io": {
"StatusCode": 206,
"LastSeen": "2026-05-21T10:49:31.332949977Z"
Expand Down