-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k8s.pod.ip
Resource Attribute
#1841
Conversation
474ba80
to
55f238d
Compare
`k8s.pod.ip` can be used to uniquely identify a pod at any given moment of time. For example, OpenTelemetry collector's k8sprocessor uses it to make automatic correlation happen.
55f238d
to
2ba906c
Compare
b3ee56b
to
15f64a5
Compare
@@ -69,6 +69,7 @@ containers on your cluster. | |||
|---|---|---|---|---| | |||
| `k8s.pod.uid` | string | The UID of the Pod. | `275ecb36-5aa8-4c2a-9c47-d8bb681b9aff` | No | | |||
| `k8s.pod.name` | string | The name of the Pod. | `opentelemetry-pod-autoconf` | No | | |||
| `k8s.pod.ip` | string | The IP address assigned to the Pod. | `172.17.0.5` | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How specific to k8s is this? Could it also be added to the general host semantic conventions at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/host.md as just host.ip
?
Is there only ever a single k8s host IP, or should it be an array? Can the IP change (not supported by resources)? Does the pod only have an IPv4 or also v6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a host (as described by the semantic conventions) map to a k8s pod (I think it does) or more to a k8s node/host? I think we can add it to the host section.
Is there only ever a single k8s host IP, or should it be an array? Can the IP change (not supported by resources)? Does the pod only have an IPv4 or also v6?
Generally, the value is extracted by a service that receives telemetry data from a pod/host such as the Otel collector so in practice it is always a single IP. IP can change but in that case, I think we should treat it as a separate resource for overall simplicity. I don't think we should assume it can be any valid IP (4 and 6).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a host (as described by the semantic conventions) map to a k8s pod (I think it does) or more to a k8s node/host? I think we can add it to the host section.
Good point, this is currently a bit unclear to me anyway. We may have an k8s pod that runs inside a k8s node which is a VM which runs on a physical host, all of which might have different "host" information. I assume we always want the "innermost" host info, but I don't think this is stated anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, the value is extracted by a service that receives telemetry data from a pod/host such as the Otel collector so in practice it is always a single IP. IP can change but in that case, I think we should treat it as a separate resource for overall simplicity.
Good idea. In that case I think you should add a note that this is is not something reported by the agent. Also, since who reports the IP matters (a lot, if there is something like a NAT in-between), I would suggest to not use k8s.pod.ip
, but telemetry.received_from_ip
(or maybe a new group telemetry.received_from
would make sense if there are similar attributes that only make sense to add on the collector, in which case it would become telemetry.received_from.ip
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should assume it can be any valid IP (4 and 6).
Did you mean "I do think"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do all you think about having two attributes? k8s.pod.ipv4
and k8s.pod.ipv6
? If I read that comment correctly, it seems pod can have only one of each ipv4 and ipv6. The reason I'm a bit reluctant on using primary is that one of the use cases for this is the k8s processor which just reads the source IP of the sender pod. Could the collector see data coming in from the "secondary" IP based on network configuration? If so, in that case we'll end up recording the non-primary IP as the primary one. All functionality would still work as far as k8s processor is concerned but the metadata we record could be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this attribute can be useful outside of k8s processor like use cases as well but thinking if a generic data_source_ip
attribute makes more sense to solve more specific use cases for all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do all you think about having two attributes? k8s.pod.ipv4 and k8s.pod.ipv6?
I was thinking about that as well, but since one of these is in the pod_ip field and the other is only in the pod_ips array in that data structure linked by @seh, I would say that in doubt order matters.
So I would prefer a single attribute k8s.pod.primary_ips
of type string array. Maybe assigned_ips
instead of primary IPs.
if a generic data_source_ip attribute makes more sense to solve more specific use cases for all platforms
I agree to use a separate attribute for that use case, see my previous comment above: #1841 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, while a pod can have more than one IP address, at present it can have at most one IPv4 address and one IPv6 address. It cannot have two or more IPv4 addresses, and cannot have two or more IPv6 addresses. The validation asserts that the pod either has one IP address or it's dual-stacked and has one of each kind.
Given that, perhaps we can have something like k8s.pod.primary_ip_address and k8s.pod.secondary_ip_address, where the latter is only occasionally populated, but if it is, the former must be populated too.
The Kubernetes API allows more IP addresses by type (the "status.podIPs" field being an array), but constrains the use and interpretation by the comments. I take the comments as binding.
Also, though not documented yet, the downward API does allow a pod author to capture the "status.podIPs" field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more detail, speaking of the downward API: Since the values injected as either environment variables or file content are of type string, Kubernetes has to flatten the "status.podIPs" array in order to inject it.
It turns out that Kubernetes first sorts the bare value of the "status.podIPs" array in order to place the IP address with the same family as the hosting node's IP address, then supplies the array as a comma-separated string. It's interesting that the downward API does not trust the prescribed relationship between the "status.podIP" and "status.podIPs" fields; not only does it not assume that the former is the first element of the latter, but it also may flip the latter around, such that what the downward IP presents does not match the pod's status fields.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of the attribute, but think it needs to be described more precisely before merging.
See inline comment.
@@ -69,6 +69,7 @@ containers on your cluster. | |||
|---|---|---|---|---| | |||
| `k8s.pod.uid` | string | The UID of the Pod. | `275ecb36-5aa8-4c2a-9c47-d8bb681b9aff` | No | | |||
| `k8s.pod.name` | string | The name of the Pod. | `opentelemetry-pod-autoconf` | No | | |||
| `k8s.pod.ip` | string | The IP address assigned to the Pod. | `172.17.0.5` | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood your earlier comments correctly, you wanted to set this semantic attribute on the collector or backend to the sender IP. If I understood @jmacd correctly, this should be set to a "well defined" "primary" IP (by whom? How do you know which is the primary)?
This leaves too many questions open, especially considering that none of the discussion has yet made it back to the markdown 🙂
efb3a31
to
34daa50
Compare
34daa50
to
1e52394
Compare
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@owais please address the feedback. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
k8s.pod.ip
can be used to uniquely identify a pod at any given moment of time. For example, OpenTelemetry collector's k8sprocessor uses it to make automatic correlation happen.