Skip to content

[extension/bearertokenauth] Fix token file refresh for Kubernetes projected volumes#45953

Closed
grelland wants to merge 5 commits into
open-telemetry:mainfrom
grelland:fix-bearertokenauthextension-refresh-regression
Closed

[extension/bearertokenauth] Fix token file refresh for Kubernetes projected volumes#45953
grelland wants to merge 5 commits into
open-telemetry:mainfrom
grelland:fix-bearertokenauthextension-refresh-regression

Conversation

@grelland

@grelland grelland commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes a bug which caused token file refresh to break completely for files backed by Kubernetes projected volumes (ConfigMaps, Secrets, serviceAccountTokens).
The bug is a regression introduced in #44104, which removed a crucial and very specifically documented workaround for this particular case. This has resulted in token files backed by projected volumes ("double symlinks") not refreshing at all.

I have encountered this issue in production (v0.144.0), and am also able to reproduce it consistently. The last well-behaving version seems to be v0.138.0. Applying this PR fixes the issue for my reproduction.

The fix is mostly a revert to the previous code, but with a couple of notable changes:

  • The event handler now uses bit masking to decode the fs events (this was also done in the buggy changeset)
  • The error indicating the double removal of the watcher is now ignored in the event handler

Note: as with the previous implementation, on some systems there is the potential for receiving two discrete events in quick succession. This causes the token to be refreshed twice.
Even if this is not-so-nice, it's really just a tiny read and store operation and should have negligible impact. One could perhaps introduce a simple debounce mechanism (with, say, 1s delay) to smooth it over, but I consider that out-of-scope for this PR.

Testing

Added TestBearerTokenFileTokenRotationWithSymlink which simulates the "atomic swap" behaviour of Kubernetes projected volumes on Linux. The test failed before, and passes after the fix is applied. Notably this test is only really applicable on Linux, and will pass in both cases on MacOS (don't know about Windows) due to difference in behaviours between kqueue and inotify.

The bug itself was easily reproducible by running the latest version of otelcollector-contrib in a local kind cluster, configured with reading the token from a ConfigMap-backed volume. On the pre-fix collector the changes to the ConfigMap are not picked up and the token is thus never updated, whilst it works as expected with the fix applied.

@grelland grelland requested a review from a team as a code owner February 9, 2026 22:51
@grelland grelland requested a review from dmitryax February 9, 2026 22:51
@linux-foundation-easycla

linux-foundation-easycla Bot commented Feb 9, 2026

Copy link
Copy Markdown

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions Bot added the first-time contributor PRs made by new contributors label Feb 9, 2026
@github-actions

github-actions Bot commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better!

@github-actions github-actions Bot requested a review from frzifus February 9, 2026 22:52
@grelland grelland force-pushed the fix-bearertokenauthextension-refresh-regression branch from 4f3c414 to 2f2903d Compare February 9, 2026 23:01
@dmitryax

Copy link
Copy Markdown
Member

@grelland thanks for fixing it. Please take a look at the failing tests.

@pavolloffay can you please review this?

@grelland

grelland commented Feb 10, 2026

Copy link
Copy Markdown
Contributor Author

@grelland thanks for fixing it. Please take a look at the failing tests.

Fixed by skipping the test on Windows. The atomic symlink swap scenario is not possible on Windows (and the kubelet falls back to a non-atomic remove+create strategy).

@grelland grelland force-pushed the fix-bearertokenauthextension-refresh-regression branch from 764dc14 to 69ca2fa Compare February 10, 2026 08:27
@grelland

grelland commented Feb 18, 2026

Copy link
Copy Markdown
Contributor Author

@dmitryax 👋 Can someone approve running the test suite and also have a look at this?
We're currently blocked from upgrading past v0.138.0 due to this bug, so would be great to get it moving. Thanks!

Aneurysm9 added a commit to Aneurysm9/opentelemetry-collector-contrib that referenced this pull request Feb 20, 2026
…d volumes

Watch the file itself instead of the parent directory. Kubernetes
projected volumes (ConfigMaps, Secrets, serviceAccountTokens) use a
double-symlink chain that is rotated via an atomic rename of an
intermediate .data symlink followed by removal of the old timestamped
directory. Watching the parent directory missed these events because
the filename filter never matched the intermediate symlink names.

The fix watches the file directly (fsnotify follows symlinks to the
underlying inode) and on Remove/Chmod events removes and re-adds the
watcher so it follows the new symlink target, then reloads the value.
This matches the approach in open-telemetry#45953.

Co-authored-by: Halvdan Hoem Grelland <grelland@users.noreply.github.com>
Assisted-by: Kiro (Amazon Q Developer)
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Aneurysm9 added a commit to Aneurysm9/opentelemetry-collector-contrib that referenced this pull request Feb 23, 2026
…d volumes

Watch the file itself instead of the parent directory. Kubernetes
projected volumes (ConfigMaps, Secrets, serviceAccountTokens) use a
double-symlink chain that is rotated via an atomic rename of an
intermediate .data symlink followed by removal of the old timestamped
directory. Watching the parent directory missed these events because
the filename filter never matched the intermediate symlink names.

The fix watches the file directly (fsnotify follows symlinks to the
underlying inode) and on Remove/Chmod events removes and re-adds the
watcher so it follows the new symlink target, then reloads the value.
This matches the approach in open-telemetry#45953.

Co-authored-by: Halvdan Hoem Grelland <grelland@users.noreply.github.com>
Assisted-by: Kiro (Amazon Q Developer)
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
@Aneurysm9

Copy link
Copy Markdown
Member

I believe #46470 should render this PR obsolete as the fix it provides was included in the common helper shipped in #46227. Thanks again @grelland!

@grelland

grelland commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

@Aneurysm9 I guess so. @MovieStoreGuy Should I close this one?

@MovieStoreGuy

Copy link
Copy Markdown
Contributor

@Aneurysm9 I guess so. @MovieStoreGuy Should I close this one?

Yeah, looking at the PR @Aneurysm9 raised, this change would not be needed and be consistent experience across all usages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants