Document OBI k8s-cache#9753
Conversation
There was a problem hiding this comment.
I found one documentation accuracy issue that should be fixed before merge.
The new k8s-cache section correctly explains the main scalability benefit, but a couple of sentences overstate the behavior by saying that OBI no longer talks to the Kubernetes API server directly. In obi, remote-cache mode replaces each pod’s own informer/watch traffic, but OBI still performs limited direct API lookups for node and cluster metadata. The important point is that k8s-cache removes the per-pod informer fan-out and substantially reduces API load, not that it eliminates all API access. As currently written, the docs could push operators to remove API permissions from OBI pods once they enable k8s-cache, which would break host.id and cluster-name enrichment.
I left inline suggestions on the two affected passages.
| To avoid that, OBI ships an optional companion service called `k8s-cache`. It | ||
| runs as a small `Deployment`, watches the Kubernetes API once on behalf of every | ||
| OBI Pod, and streams the metadata to OBI instances over gRPC. Each OBI Pod opens | ||
| a single connection to the cache instead of to the API server. |
There was a problem hiding this comment.
This is a bit too strong as written. In OBI’s remote-cache mode, the cache replaces each pod’s own informer/watch traffic, but OBI still needs some direct Kubernetes API access for node and cluster metadata lookup. The external-cache integration manifest in obi also keeps a service account on the OBI DaemonSet for that reason.
To avoid that, OBI ships an optional companion service called
k8s-cache. It
runs as a smallDeployment, watches the Kubernetes API once on behalf of every
OBI Pod, and streams the metadata to OBI instances over gRPC. This removes OBI's
per-Pod informer traffic to the API server and greatly reduces API load, though
OBI may still perform limited direct Kubernetes API lookups for node and cluster
metadata.
| To avoid that, the OBI Helm chart can deploy a small companion service called | ||
| `k8s-cache`. The cache watches the Kubernetes API once on behalf of all OBI Pods | ||
| and streams metadata to them over gRPC, so OBI no longer talks to the API server | ||
| directly. For more background on what `k8s-cache` is and when to use it, see the |
There was a problem hiding this comment.
Same issue here: enabling k8s-cache does not mean OBI never talks to the API server again. It stops the per-pod informer fan-out, which is the important scalability point, but some direct API lookups still remain.
To avoid that, the OBI Helm chart can deploy a small companion service called
k8s-cache. The cache watches the Kubernetes API once on behalf of all OBI Pods
and streams metadata to them over gRPC, which removes OBI's per-Pod informer
traffic to the API server and substantially reduces API load. For more
background on whatk8s-cacheis and when to use it, see the
Kubernetes setup guide.
grcevski
left a comment
There was a problem hiding this comment.
LGTM! Nice! One thing I would mention is that the reason OBI can overwhelm the API server is that every node watches the whole cluster, so that we can figure out the peer names of services. This may not be obvious to folks and may think it's not an actual problem, since most daemonsets watch only the current node.
vitorvasc
left a comment
There was a problem hiding this comment.
Thanks for the contribution!
Left a couple of suggestions regarding the localized content.
Co-authored-by: Vitor Vasconcellos <vvasconcellos1@gmail.com>
|
@vitorvasc i have committed all suggested changes but CI still fails |
tiffany76
left a comment
There was a problem hiding this comment.
Hi @NimrodAvni78 - I left one copy edit suggestion.
The link checks are failing because of the chicken-or-egg problem. You're linking to page sections that don't yet exist, because you've created them in the same PR. The solution here is to break up this PR into two:
-
Add the two new sections to the
kubernetes-helmandkubernetessetup pages without the cross-reference links to each other. -
Once that PR is merged, you can create a second PR with the rest of the changes, including adding in the cross-reference links to the two new sections.
Does that make sense?
Per review feedback, ship the new sections without the cross-page links between them so the link checker passes. The cross-references can be added in a follow-up PR once the sections exist on main.
|
@tiffany76 @vitorvasc iv'e addressed the comments :) |
tiffany76
left a comment
There was a problem hiding this comment.
Thanks, @NimrodAvni78! Looks good to me. I will override @vitorvasc's request for changes and merge.
Please go ahead and submit the followup PR to add the cross-references. I've created an issue just so we don't lose track of it.
All requested changes have been made.
Part of open-telemetry/opentelemetry-ebpf-instrumentation#1330
Add documentation for the OBI k8s-cache
the OBI k8s-cache is a component used to take load off the k8s API in cases where a large fleet of OBI instances run on the same k8s cluster, all needing cluster level k8s metadata
for translations i have entered a stub anchor to refer to and link to the english documentation
Footnotes
Yes, I can answer maintainer questions about the content of this PR, without using AI. ↩