Skip to content

Document OBI k8s-cache#9753

Merged
tiffany76 merged 16 commits into
open-telemetry:mainfrom
NimrodAvni78:nimrodavni78/obi-k8s-cache-docs
May 28, 2026
Merged

Document OBI k8s-cache#9753
tiffany76 merged 16 commits into
open-telemetry:mainfrom
NimrodAvni78:nimrodavni78/obi-k8s-cache-docs

Conversation

@NimrodAvni78
Copy link
Copy Markdown
Contributor

@NimrodAvni78 NimrodAvni78 commented Apr 26, 2026

  • I have read and followed the Contributing docs, especially the "First-time contributing?" section.
  • This PR has content that I did not fully write myself.
  • I have the experience and knowledge necessary to understand, review, and validate all content in this PR.1

Part of open-telemetry/opentelemetry-ebpf-instrumentation#1330

Add documentation for the OBI k8s-cache
the OBI k8s-cache is a component used to take load off the k8s API in cases where a large fleet of OBI instances run on the same k8s cluster, all needing cluster level k8s metadata

for translations i have entered a stub anchor to refer to and link to the english documentation

Footnotes

  1. Yes, I can answer maintainer questions about the content of this PR, without using AI.

@otelbot-docs otelbot-docs Bot requested a review from a team April 26, 2026 12:20
@otelbot-docs otelbot-docs Bot added missing:docs-approval Co-owning SIG has provided approval, PR needs approval from docs maintainer missing:sig-approval Co-owning SIG didn't provide an approval labels Apr 26, 2026
@NimrodAvni78 NimrodAvni78 changed the title document obi k8s-cache Document OBI k8s-cache Apr 26, 2026
@otelbot-docs otelbot-docs Bot requested review from a team and theletterf and removed request for a team April 27, 2026 10:23
@NimrodAvni78 NimrodAvni78 marked this pull request as ready for review April 27, 2026 11:13
@NimrodAvni78 NimrodAvni78 requested a review from a team as a code owner April 27, 2026 11:13
Copy link
Copy Markdown
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one documentation accuracy issue that should be fixed before merge.

The new k8s-cache section correctly explains the main scalability benefit, but a couple of sentences overstate the behavior by saying that OBI no longer talks to the Kubernetes API server directly. In obi, remote-cache mode replaces each pod’s own informer/watch traffic, but OBI still performs limited direct API lookups for node and cluster metadata. The important point is that k8s-cache removes the per-pod informer fan-out and substantially reduces API load, not that it eliminates all API access. As currently written, the docs could push operators to remove API permissions from OBI pods once they enable k8s-cache, which would break host.id and cluster-name enrichment.

I left inline suggestions on the two affected passages.

To avoid that, OBI ships an optional companion service called `k8s-cache`. It
runs as a small `Deployment`, watches the Kubernetes API once on behalf of every
OBI Pod, and streams the metadata to OBI instances over gRPC. Each OBI Pod opens
a single connection to the cache instead of to the API server.
Copy link
Copy Markdown
Contributor

@MrAlias MrAlias Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too strong as written. In OBI’s remote-cache mode, the cache replaces each pod’s own informer/watch traffic, but OBI still needs some direct Kubernetes API access for node and cluster metadata lookup. The external-cache integration manifest in obi also keeps a service account on the OBI DaemonSet for that reason.

To avoid that, OBI ships an optional companion service called k8s-cache. It
runs as a small Deployment, watches the Kubernetes API once on behalf of every
OBI Pod, and streams the metadata to OBI instances over gRPC. This removes OBI's
per-Pod informer traffic to the API server and greatly reduces API load, though
OBI may still perform limited direct Kubernetes API lookups for node and cluster
metadata.

To avoid that, the OBI Helm chart can deploy a small companion service called
`k8s-cache`. The cache watches the Kubernetes API once on behalf of all OBI Pods
and streams metadata to them over gRPC, so OBI no longer talks to the API server
directly. For more background on what `k8s-cache` is and when to use it, see the
Copy link
Copy Markdown
Contributor

@MrAlias MrAlias Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: enabling k8s-cache does not mean OBI never talks to the API server again. It stops the per-pod informer fan-out, which is the important scalability point, but some direct API lookups still remain.

To avoid that, the OBI Helm chart can deploy a small companion service called
k8s-cache. The cache watches the Kubernetes API once on behalf of all OBI Pods
and streams metadata to them over gRPC, which removes OBI's per-Pod informer
traffic to the API server and substantially reduces API load. For more
background on what k8s-cache is and when to use it, see the
Kubernetes setup guide.

Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice! One thing I would mention is that the reason OBI can overwhelm the API server is that every node watches the whole cluster, so that we can figure out the peer names of services. This may not be obvious to folks and may think it's not an actual problem, since most daemonsets watch only the current node.

@otelbot-docs otelbot-docs Bot removed the missing:sig-approval Co-owning SIG didn't provide an approval label Apr 27, 2026
@otelbot-docs otelbot-docs Bot requested review from a team April 28, 2026 08:21
Comment thread content/en/docs/zero-code/obi/setup/kubernetes.md Outdated
@otelbot-docs otelbot-docs Bot requested a review from a team May 3, 2026 08:49
Copy link
Copy Markdown
Member

@vitorvasc vitorvasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Left a couple of suggestions regarding the localized content.

Comment thread content/es/docs/zero-code/obi/setup/kubernetes.md Outdated
Comment thread content/ja/docs/zero-code/obi/setup/kubernetes.md Outdated
Comment thread content/en/docs/zero-code/obi/configure/metrics-traces-attributes.md Outdated
Comment thread content/en/docs/zero-code/obi/setup/kubernetes-helm.md Outdated
Comment thread content/en/docs/zero-code/obi/setup/kubernetes.md Outdated
@otelbot-docs otelbot-docs Bot requested review from a team May 13, 2026 16:30
NimrodAvni78 and others added 2 commits May 14, 2026 09:35
Co-authored-by: Vitor Vasconcellos <vvasconcellos1@gmail.com>
@NimrodAvni78
Copy link
Copy Markdown
Contributor Author

@vitorvasc i have committed all suggested changes but CI still fails

Copy link
Copy Markdown
Member

@tiffany76 tiffany76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NimrodAvni78 - I left one copy edit suggestion.

The link checks are failing because of the chicken-or-egg problem. You're linking to page sections that don't yet exist, because you've created them in the same PR. The solution here is to break up this PR into two:

  1. Add the two new sections to the kubernetes-helm and kubernetes setup pages without the cross-reference links to each other.

  2. Once that PR is merged, you can create a second PR with the rest of the changes, including adding in the cross-reference links to the two new sections.

Does that make sense?

NimrodAvni78 and others added 3 commits May 26, 2026 14:37
Per review feedback, ship the new sections without the cross-page
links between them so the link checker passes. The cross-references
can be added in a follow-up PR once the sections exist on main.
@NimrodAvni78
Copy link
Copy Markdown
Contributor Author

@tiffany76 @vitorvasc iv'e addressed the comments :)

Copy link
Copy Markdown
Member

@tiffany76 tiffany76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @NimrodAvni78! Looks good to me. I will override @vitorvasc's request for changes and merge.

Please go ahead and submit the followup PR to add the cross-references. I've created an issue just so we don't lose track of it.

@otelbot-docs otelbot-docs Bot added ready-to-be-merged This PR is ready to be merged by a maintainer and removed missing:docs-approval Co-owning SIG has provided approval, PR needs approval from docs maintainer labels May 28, 2026
@tiffany76 tiffany76 dismissed vitorvasc’s stale review May 28, 2026 20:17

All requested changes have been made.

@tiffany76 tiffany76 added this pull request to the merge queue May 28, 2026
Merged via the queue into open-telemetry:main with commit 07668da May 28, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang:es lang:ja ready-to-be-merged This PR is ready to be merged by a maintainer sig:obi

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants