[8.15](backport #41453) Fix Node and container resource limit metrics missing intermittently #41483
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Fix Node and container resource limit metrics missing intermittently.
This is a bug very recently introduced by the refactor in #41216. Metadata watchers are not just responsible for updating metadata, but also Node and container metrics. Only updating the latter eagerly when metadata is requested leads to races, where the values may be missing depending on the order in which metrics are fetched.
This fix decouples metrics calculation from metadata calculation. Metrics now have their own handlers attached to the watcher, and are completely detached from metadata enrichers. I don't like the resulting architecture that much, as it concentrates a lot of logic in the watcher. But it is an improvement over the status quo, and I'd like to fix this bug promptly before we release it to users.
The bug was quite difficult to catch in E2E tests, as it could take some time to appear. I've tested this change much more carefully, and haven't seen any issues after hours of running it in my test cluster.
Checklist
How to test this PR locally
Simplest way is to install elastic-agent standalone and look at the default Kubernetes dashboard.
Related issues
This is an automatic backport of pull request #41453 done by [Mergify](https://mergify.com).