-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k8s.container.restart_count
Resource attribute
#1945
Add k8s.container.restart_count
Resource attribute
#1945
Conversation
98295b5
to
6128c67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please justify why this is not a "metric". Seems to be a metric not a resource, even the name suggest that it is a "counter" :)
@bogdandrutu it's not a metric of a k8s pod or something like that, it's rather a metadata/identifier of particular container instance which is a resource itself I believe. It can be also called In k8s logs collection, we want to use the following set of attributes to identify a particular container: |
I think I agree with @bogdandrutu that it can also be a metric, but right now the source of this data is a log file, so to capture it as a metric we are essentially talking about signal conversion, which is a valid approach, but it does not prohibit us from logging this data as a log record attribute. |
@tigrannajaryan thanks for replying.
If a container is a source of logs/traces/metrics, it does seem to be a resource. Why do you think span/log record attribute will be better place? Meaning of this attribute is very similar to
We can call it |
We don't have this formally defined anywhere in Otel but it is clear that some Resource attributes identify the Resource and some others are non-identifying. This is obviously a non-identifying attribute of a Resource (unless it is somehow possible for different runs to co-exist, which I think is not possible since this is about restarts). I think the name is what causes a bit of a confusion and friction here. However if we are certain that this attribute indeed is associated with the Resource (Container instance) and does not change during the lifetime of the Resource then it needs to be a Resource attribute. I think it is true in this case so we should accept the convention. |
To be clear: I can't think of a better name. |
I don't think this is only about restarts. I took the name from k8s API where it is referred as RestartCount, but it actually seems to be an identifying resource attribute for a particular container instance. From a container orchestrator perspective, there can be other containers with different
Yes, I believe this attribute does not changed during the lifetime of a container as a Resource. Once container gets restarted ( |
I see, this is important. In that case I think a name like If you could find other (non-K8s) container orchestration systems which exhibit a similar behavior and see what the call this concept it may be easier to arrive at the right name that is more universal (since |
I disagreee. I think the restart ID or whatever we want to call it is even required if you want to retrieve the logs of a particular container restart cycle (e.g. to find out why it restarted).
I don't think this has much to do with it. A timestamp could in theory be used together with a POD ID + container name as an alternative to the restart ID. But how does this make the restart ID non-identifying? Is the timestamp an identifying attribute? We don't even have a "process start time" or "container start time" attribute on the resource (though that would maybe make sense). Also, when it comes to timestamps very close to restarts, it might depend on the clock from which they came to which restart cycle they would map due to inexactness. |
I did not say it is unnecessary data. :-) I said it is not required to uniquely identify the Container Instance (and perhaps I am wrong). Since there can ever be only one instance of that container instance the restart counter is superfluous for the purposes of identifying which container instance it was. It is of course not superfluous as you correctly point out for the purposes of understanding how many times the instance restarted and which "numbered start" of the instance the particular log belongs. So, I guess it depends on how we define the Container Instance entity. Is it the same instance if it restarts or a different instance? Depending on that the restart counter will be either a non-identifying or an identifying attribute of the instance. We can debate about this (I am not sure what's right approach), but I think for the purposes of this particular PR it does not matter, since we agree that one way or another the "restart counter" is an attribute of the Resource. |
I didn't find any insights on a name for this attribute looking at other container orchestrators. Some other options that come to my mind are |
@bogdandrutu are you OK with the explanation in this thread of why this is not a metric? |
This change provides an option to fetch container metadata from k8s API in addition to k8s pod metadata. The following attributes now can be automatically added by the k8sattributes processor: - container.image.name - container.image.tag - container.id `container.image.name` and `container.image.tag` require additional container identifier present in resource attributes: `container.name`. `container.id` requires additional container run identifiers present in resource attributes: `container.name` and `run_id`. `run_id` identified is a subject to change, see open-telemetry/opentelemetry-specification#1945
Please add the explanation why the restart_count needs to identify the "running" instance of a container.
* [k8sattributes processor] Add optional container metadata This change provides an option to fetch container metadata from k8s API in addition to k8s pod metadata. The following attributes now can be automatically added by the k8sattributes processor: - container.image.name - container.image.tag - container.id `container.image.name` and `container.image.tag` require additional container identifier present in resource attributes: `container.name`. `container.id` requires additional container run identifiers present in resource attributes: `container.name` and `run_id`. `run_id` identified is a subject to change, see open-telemetry/opentelemetry-specification#1945 * Make linter happy * Make container attributes enabled by default
@bogdandrutu I added additional description to the attribute. Please let me know if it works, or you want something other than that? |
@tigrannajaryan @Oberon00 what do you think about the name? Maybe something like Or we can keep |
I think restart_count works, and is the only existing name for the concept I could find. I just wonder: If this is actually a k8s-specific concept, should the attribute be moved to the k8s semantic conventions, e.g. k8s.container.restart_count (the k8s.container group already exists in the semantic conventions, though I have no idea what the difference between container.name and k8s.container.name would be) |
@Oberon00 In regard to restart_count, is it available in Docker? I found this moby/moby#25859 which seems to suggest that docker also has a concept of "restarts". |
+1 for |
@bogdandrutu thanks for posting the link. Looks like docker engine uses the same terminology, but the meaning is a bit different. RestartCount is non-identifying attribute of the container in docker engine, for example ContainerID stays the same between restarts. From the other hand, when it's restarted by k8s, the container is different, it gets another ContainerID. Given that, I think we should use @Oberon00 @tigrannajaryan @bogdandrutu I'm going to move it the k8s namespace if you agree. |
@dmitryax Is this is a significant difference that warrants that the counter is placed in k8s namespace specifically? If in the future we decide that we want to also have convention for docker's restart count will we introduce Do we expect that If there are such differences then yes let's make it a separate attribute for k8s. However, if the only difference is the behavior of whether container id changes or no when |
I don't think the difference is significant, but they do have different meaning. My concern is that
No, they are both incremental decimal numbers.
Yes, this is the only difference that I observed: k8s restarts containers by replacing them, docker reuses existing containers. And I don't think that there is a use case when both approaches applicable at the same time, as opposed to |
I am leaning towards having separate attributes. From a theoretical (!) standpoint, if I want to get the logs of a container at a certain restart cycle I would have to know either: container.name + container.restart_count (since different k8s-level container restarts have different container.names) or k8s.pod.name + k8s.container.name + k8s.container.restart_count + container.restart_count (for the theoretical case that a container is restarted by the runtime within a pod). OK, k8s does not use any container runtime-provided restart mechanism today. But what if we have another orchestrator that does? Or k8s starts using it? Even now the semantics seem to be:
I don't like "except if" semantics much, if we can avoid them. |
It seems like it is slightly safer and more future proof if we keep the attribute specific to k8s, since we know the exact semantics of it while not sure (can only speculate) if other orchestration runtimes will have a similar attribute. |
This change adds a Resource attribute to represent number of container restarts in kubernetes. This is can be used in k8s logs collection to identify a particular container instance, where the number of container restarts is a part of a log file path.
8f384e7
to
36f44c9
Compare
container.restart_count
Resource attributek8s.container.restart_count
Resource attribute
Moved to k8s namespace |
Change name of the attribute according to agreement in open-telemetry/opentelemetry-specification#1945. It's better if the actual change can be merged before the next release to avoid breaking changes in k8sattributes processor.
container.name and k8s.container.name are the same for all container runtimes except docker when using kubernetes. The name difference comes from the "dockershim", which makes docker implement the CRI. Dockershim is also deprecated, and will be removed from upstream in the next 6 months. So the name difference isn't really k8s vs others, but docker vs others, and will become less common. But assuming we are using separate names for container runtime name vs kubernetes name, I think it is correct to separate the restart count as well, since it is the number of times the container has restarted for each name. |
I think we have enough approvals, but let's keep this open for another day before merging. |
…t` (#5572) Change name of the attribute according to agreement in open-telemetry/opentelemetry-specification#1945. It's better if the actual change can be merged before the next release to avoid breaking changes in k8sattributes processor.
…1945) This change adds a Resource attribute to represent number of container restarts in kubernetes. This is can be used in k8s logs collection to identify a particular container instance, where the number of container restarts is a part of a log file path. Co-authored-by: Tigran Najaryan <[email protected]>
This change adds a Resource attribute to represent number of container restarts. This is can be used in k8s logs collection to identify a particular container instance, where the number of container restarts provided as a part of log file path.
Related issues #