reporter: extract container.ID from cgroupv2 path#548
Conversation
This is a follow up to #535. While the tracer part of the project reports the cgroup v2 path for each sample, the reporter is expected to report the container ID. The container ID then can be used to associate the sample to more detailed resource information. In the context of OTel collector, the container ID then can be used like this: ``` k8sattributes: auth_type: "serviceAccount" passthrough: false filter: node_from_env_var: KUBERNETES_NODE_NAME extract: metadata: - k8s.pod.name - k8s.pod.uid - k8s.deployment.name - k8s.namespace.name - k8s.node.name - k8s.pod.start_time - service.namespace - service.name - service.version - service.instance.id labels: - tag_name: app.label.component key: app.kubernetes.io/component from: pod otel_annotations: true pod_association: - sources: - from: resource_attribute name: container.id ``` As the cgroupv2 path contains further information, that could be beneficial for other reporters, the extraction of the container ID happens in the reporter. Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
| ) | ||
|
|
||
| // LookupCgroupv2 returns the cgroupv2 ID for pid. | ||
| func LookupCgroupv2(cgrouplru *lru.SyncedLRU[PID, string], pid PID) (string, error) { |
There was a problem hiding this comment.
Moved the functionality of this function to reporter/util.go and made it a package private function.
| } | ||
| // Set a lifetime to reduce the risk of invalid data in case of PID reuse. | ||
| cgroupv2ID.SetLifetime(90 * time.Second) | ||
| pidToContainerID.SetLifetime(90 * time.Second) |
There was a problem hiding this comment.
Not for this PR, but we can and should do better here (e.g. ProcessManager pub-sub scheme where various subsystems can get notified about a PID exit) since the ProcessManager will receive notifications for every exited PID.
| ) | ||
|
|
||
| var ( | ||
| cgroupv2ContainerIDPattern = regexp.MustCompile(`0:.*?:.*?([0-9a-fA-F]{64})(?:\.scope)?$`) |
There was a problem hiding this comment.
We can either relax the regexp like shown below (I'm not sure if .scope is always present so I took it out) or make the matching more fine-grained where we'd first match on the general /proc/pid/cgroup format (0:.*?:(.*), as specified here) and then match again on specific container runtimes that we support.
| cgroupv2ContainerIDPattern = regexp.MustCompile(`0:.*?:.*?([0-9a-fA-F]{64})(?:\.scope)?$`) | |
| cgroupv2ContainerIDPattern = regexp.MustCompile(`0:.*?:.*?([0-9a-fA-F]{64})`) |
There was a problem hiding this comment.
Fixed with daa9dc6.
I didn't apply the suggested regex, as it would match the pod ID rather than the container ID. For the container ID we need to use the last occurrence of the 64 character hex ID. With the suggested regex it is possible to match the first 64 character hex ID, which is the pod ID.
I went with a single regex, as I expected push back if applying two regexs in sequential order.
There was a problem hiding this comment.
Replaced the return statement with a log statement in 7d42302
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
2ced1d4 to
daa9dc6
Compare
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
| if err != nil { | ||
| log.Debugf("Failed to get a cgroupv2 ID as container ID for PID %d: %v", | ||
| meta.PID, err) | ||
| return err |
There was a problem hiding this comment.
Now we're losing traces on lookupContainerID errors (e.g. if a PID dies). If we continue we'd be reporting traces without container ID.
Thinking about this some more, it might be better to get rid of this LRU completely and fetch the container ID when we first parse a PID in updatePIDInformation.
We'd then store the container ID inside ProcessMeta in the process registry ProcessManager.pidToProcessInfo. The wins are two-fold:
- No need for repeat wasteful container ID extractions on expiration, a container ID will only be fetched once for any tracked PID.
- We don't throw away traces and there are no race conditions with PID exits.
There was a problem hiding this comment.
I agree with this earlier #548 (comment), that changes to the API of reporter to bring it closer to ProcessManager, should be a separate change.
There was a problem hiding this comment.
My earlier comment would only replace a small part of this PR. With my current comment, I propose replacing this entire PR with something else (we'd keep the containerID extraction logic but nothing else). If you agree that this is the way to go, then it's faster to just do that instead of merging this PR and then creating another PR that will remove most of it.
Also I don't propose any changes to the reporter API, we just need to add an extra field to samples.TraceEventMeta.
There was a problem hiding this comment.
I don't think, #577 should be the way forward. Profiling capabilities should not be mixed with such functionality.
This PR introduces a basic support for container IDs. But there are use cases, where pod ID, cgroupv1 data or other information is relevant to the reporter.
Therefore, I have the following suggestion:
- merge this PR
- introduce a generic hook in processmanager, so it can be configured to extract any relevant information and report this via TraceEventMeta.
To me processmanager should focus on properly handling processes rather than extracting every possible process data.
There was a problem hiding this comment.
I agree that processmanager should have more plugin/hook based system to collect the meta data.
After 15min brief look, I would prefer #577 as first step. With the following rationale:
reporterindicates it just reports things, this is making it collect data tooreportermight be too late to collect the container ID in some short lived process cases, where asprocessmanagerwould be more suitable to do itreporterneeds extra LRUs with memory overhead (and CPU for cleaning it?);processmanagercan stash the information in existing data structuresprocessmanageralready collects and caches process meta data- container ID is also such piece of data that other reporters likely want it
- everytime we have LRUs containg amending data, there is possibility for out of sync where the LRU no longer holds the data we still need (or alternatively it needs to be excessively large and becomes memory hog)
Additionally the proposal is to add processmanager hooks this indicates that the natural place to collect this information is in context of processmanager.
To me just collecting the metadata in processmanager makes perfect sense. It would be later step to convert it to more plugin based approach. But even then the processmanager should contain the data in attributes or similar. The reasoning is same as #384 for symbol data. The data collection should happen early.
To me processmanager should focus on properly handling processes rather than extracting every possible process data.
Processmanager is responsible for the process metadata also, and this is such data. #384 proposes to make processmanager also responsible for the symbols. I think processmanager is the central piece for everything. I suppose the fear is that it becomes too monolithic? So agreeably it needs to be more plugin based in the future.
But I think also that containerID is pretty commonly wanted thing - it is a pretty fundamental piece of process metadata. I would not try to abstract it away.
So I propose the processmanager role should be extended to include this piece of the process metadata.
There was a problem hiding this comment.
I agree with @fabled, I think there's no good reason to go with an LRU now that I've written the code in #577, it's a semantically wrong solution, open to race conditions and less performant (it keeps expiring and extracting container IDs for each PID). In addition, we already extract race-sensitive process metadata in ProcessManager so it's ugly from a consistency POV to split this logic and do that in the reporter too.
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
|
Closing in favor of #577 |
This is a follow up to #535.
While the tracer part of the project reports the cgroup v2 path for each sample, the reporter is expected to report the container ID. The container ID then can be used to associate the sample to more detailed resource information.
In the context of OTel collector, the container ID then can be used like this:
As the cgroupv2 path contains further information, that could be beneficial for other reporters, the extraction of the container ID happens in the reporter.