[extension/bearertokenauth] Fix token file refresh for Kubernetes projected volumes#45953
Conversation
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better! |
4f3c414 to
2f2903d
Compare
|
@grelland thanks for fixing it. Please take a look at the failing tests. @pavolloffay can you please review this? |
Fixed by skipping the test on Windows. The atomic symlink swap scenario is not possible on Windows (and the kubelet falls back to a non-atomic remove+create strategy). |
…nd. Ignore expected fsnotify.ErrNonExistentWatch.
764dc14 to
69ca2fa
Compare
|
@dmitryax 👋 Can someone approve running the test suite and also have a look at this? |
…d volumes Watch the file itself instead of the parent directory. Kubernetes projected volumes (ConfigMaps, Secrets, serviceAccountTokens) use a double-symlink chain that is rotated via an atomic rename of an intermediate .data symlink followed by removal of the old timestamped directory. Watching the parent directory missed these events because the filename filter never matched the intermediate symlink names. The fix watches the file directly (fsnotify follows symlinks to the underlying inode) and on Remove/Chmod events removes and re-adds the watcher so it follows the new symlink target, then reloads the value. This matches the approach in open-telemetry#45953. Co-authored-by: Halvdan Hoem Grelland <grelland@users.noreply.github.com> Assisted-by: Kiro (Amazon Q Developer) Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
…d volumes Watch the file itself instead of the parent directory. Kubernetes projected volumes (ConfigMaps, Secrets, serviceAccountTokens) use a double-symlink chain that is rotated via an atomic rename of an intermediate .data symlink followed by removal of the old timestamped directory. Watching the parent directory missed these events because the filename filter never matched the intermediate symlink names. The fix watches the file directly (fsnotify follows symlinks to the underlying inode) and on Remove/Chmod events removes and re-adds the watcher so it follows the new symlink target, then reloads the value. This matches the approach in open-telemetry#45953. Co-authored-by: Halvdan Hoem Grelland <grelland@users.noreply.github.com> Assisted-by: Kiro (Amazon Q Developer) Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
|
@Aneurysm9 I guess so. @MovieStoreGuy Should I close this one? |
Yeah, looking at the PR @Aneurysm9 raised, this change would not be needed and be consistent experience across all usages. |
Description
Fixes a bug which caused token file refresh to break completely for files backed by Kubernetes projected volumes (ConfigMaps, Secrets, serviceAccountTokens).
The bug is a regression introduced in #44104, which removed a crucial and very specifically documented workaround for this particular case. This has resulted in token files backed by projected volumes ("double symlinks") not refreshing at all.
I have encountered this issue in production (
v0.144.0), and am also able to reproduce it consistently. The last well-behaving version seems to bev0.138.0. Applying this PR fixes the issue for my reproduction.The fix is mostly a revert to the previous code, but with a couple of notable changes:
Note: as with the previous implementation, on some systems there is the potential for receiving two discrete events in quick succession. This causes the token to be refreshed twice.
Even if this is not-so-nice, it's really just a tiny read and store operation and should have negligible impact. One could perhaps introduce a simple debounce mechanism (with, say, 1s delay) to smooth it over, but I consider that out-of-scope for this PR.
Testing
Added
TestBearerTokenFileTokenRotationWithSymlinkwhich simulates the "atomic swap" behaviour of Kubernetes projected volumes on Linux. The test failed before, and passes after the fix is applied. Notably this test is only really applicable on Linux, and will pass in both cases on MacOS (don't know about Windows) due to difference in behaviours betweenkqueueandinotify.The bug itself was easily reproducible by running the latest version of
otelcollector-contribin a localkindcluster, configured with reading the token from a ConfigMap-backed volume. On the pre-fix collector the changes to the ConfigMap are not picked up and the token is thus never updated, whilst it works as expected with the fix applied.