Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s.pod.ip Resource Attribute #1841

Closed
wants to merge 3 commits into from

Conversation

owais
Copy link
Contributor

@owais owais commented Aug 2, 2021

k8s.pod.ip can be used to uniquely identify a pod at any given moment of time. For example, OpenTelemetry collector's k8sprocessor uses it to make automatic correlation happen.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 2, 2021

CLA Signed

The committers are authorized under a signed CLA.

@owais owais force-pushed the k8s-pod-ip-resource-attr branch from 474ba80 to 55f238d Compare August 2, 2021 00:12
`k8s.pod.ip` can be used to uniquely identify a pod at any given moment of time. For example, OpenTelemetry collector's k8sprocessor uses it to make automatic correlation happen.
@owais owais force-pushed the k8s-pod-ip-resource-attr branch from 55f238d to 2ba906c Compare August 2, 2021 00:12
@owais owais marked this pull request as ready for review August 2, 2021 00:13
@owais owais requested review from a team August 2, 2021 00:13
@owais owais force-pushed the k8s-pod-ip-resource-attr branch 2 times, most recently from b3ee56b to 15f64a5 Compare August 2, 2021 00:46
@@ -69,6 +69,7 @@ containers on your cluster.
|---|---|---|---|---|
| `k8s.pod.uid` | string | The UID of the Pod. | `275ecb36-5aa8-4c2a-9c47-d8bb681b9aff` | No |
| `k8s.pod.name` | string | The name of the Pod. | `opentelemetry-pod-autoconf` | No |
| `k8s.pod.ip` | string | The IP address assigned to the Pod. | `172.17.0.5` | No |
Copy link
Member

@Oberon00 Oberon00 Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How specific to k8s is this? Could it also be added to the general host semantic conventions at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/host.md as just host.ip?

Is there only ever a single k8s host IP, or should it be an array? Can the IP change (not supported by resources)? Does the pod only have an IPv4 or also v6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a host (as described by the semantic conventions) map to a k8s pod (I think it does) or more to a k8s node/host? I think we can add it to the host section.

Is there only ever a single k8s host IP, or should it be an array? Can the IP change (not supported by resources)? Does the pod only have an IPv4 or also v6?

Generally, the value is extracted by a service that receives telemetry data from a pod/host such as the Otel collector so in practice it is always a single IP. IP can change but in that case, I think we should treat it as a separate resource for overall simplicity. I don't think we should assume it can be any valid IP (4 and 6).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a host (as described by the semantic conventions) map to a k8s pod (I think it does) or more to a k8s node/host? I think we can add it to the host section.

Good point, this is currently a bit unclear to me anyway. We may have an k8s pod that runs inside a k8s node which is a VM which runs on a physical host, all of which might have different "host" information. I assume we always want the "innermost" host info, but I don't think this is stated anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the value is extracted by a service that receives telemetry data from a pod/host such as the Otel collector so in practice it is always a single IP. IP can change but in that case, I think we should treat it as a separate resource for overall simplicity.

Good idea. In that case I think you should add a note that this is is not something reported by the agent. Also, since who reports the IP matters (a lot, if there is something like a NAT in-between), I would suggest to not use k8s.pod.ip, but telemetry.received_from_ip (or maybe a new group telemetry.received_from would make sense if there are similar attributes that only make sense to add on the collector, in which case it would become telemetry.received_from.ip).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should assume it can be any valid IP (4 and 6).

Did you mean "I do think"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do all you think about having two attributes? k8s.pod.ipv4 and k8s.pod.ipv6? If I read that comment correctly, it seems pod can have only one of each ipv4 and ipv6. The reason I'm a bit reluctant on using primary is that one of the use cases for this is the k8s processor which just reads the source IP of the sender pod. Could the collector see data coming in from the "secondary" IP based on network configuration? If so, in that case we'll end up recording the non-primary IP as the primary one. All functionality would still work as far as k8s processor is concerned but the metadata we record could be misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this attribute can be useful outside of k8s processor like use cases as well but thinking if a generic data_source_ip attribute makes more sense to solve more specific use cases for all platforms.

Copy link
Member

@Oberon00 Oberon00 Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do all you think about having two attributes? k8s.pod.ipv4 and k8s.pod.ipv6?

I was thinking about that as well, but since one of these is in the pod_ip field and the other is only in the pod_ips array in that data structure linked by @seh, I would say that in doubt order matters.

So I would prefer a single attribute k8s.pod.primary_ips of type string array. Maybe assigned_ips instead of primary IPs.

if a generic data_source_ip attribute makes more sense to solve more specific use cases for all platforms

I agree to use a separate attribute for that use case, see my previous comment above: #1841 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, while a pod can have more than one IP address, at present it can have at most one IPv4 address and one IPv6 address. It cannot have two or more IPv4 addresses, and cannot have two or more IPv6 addresses. The validation asserts that the pod either has one IP address or it's dual-stacked and has one of each kind.

Given that, perhaps we can have something like k8s.pod.primary_ip_address and k8s.pod.secondary_ip_address, where the latter is only occasionally populated, but if it is, the former must be populated too.

The Kubernetes API allows more IP addresses by type (the "status.podIPs" field being an array), but constrains the use and interpretation by the comments. I take the comments as binding.

Also, though not documented yet, the downward API does allow a pod author to capture the "status.podIPs" field.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more detail, speaking of the downward API: Since the values injected as either environment variables or file content are of type string, Kubernetes has to flatten the "status.podIPs" array in order to inject it.

It turns out that Kubernetes first sorts the bare value of the "status.podIPs" array in order to place the IP address with the same family as the hosting node's IP address, then supplies the array as a comma-separated string. It's interesting that the downward API does not trust the prescribed relationship between the "status.podIP" and "status.podIPs" fields; not only does it not assume that the former is the first element of the latter, but it also may flip the latter around, such that what the downward IP presents does not match the pod's status fields.

@github-actions

This comment has been minimized.

@github-actions github-actions bot added the Stale label Aug 14, 2021
Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the attribute, but think it needs to be described more precisely before merging.

See inline comment.

@@ -69,6 +69,7 @@ containers on your cluster.
|---|---|---|---|---|
| `k8s.pod.uid` | string | The UID of the Pod. | `275ecb36-5aa8-4c2a-9c47-d8bb681b9aff` | No |
| `k8s.pod.name` | string | The name of the Pod. | `opentelemetry-pod-autoconf` | No |
| `k8s.pod.ip` | string | The IP address assigned to the Pod. | `172.17.0.5` | No |
Copy link
Member

@Oberon00 Oberon00 Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood your earlier comments correctly, you wanted to set this semantic attribute on the collector or backend to the sender IP. If I understood @jmacd correctly, this should be set to a "well defined" "primary" IP (by whom? How do you know which is the primary)?

This leaves too many questions open, especially considering that none of the discussion has yet made it back to the markdown 🙂

@Oberon00 Oberon00 added area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory and removed Stale labels Aug 16, 2021
@owais owais force-pushed the k8s-pod-ip-resource-attr branch 2 times, most recently from efb3a31 to 34daa50 Compare August 16, 2021 13:33
@owais owais force-pushed the k8s-pod-ip-resource-attr branch from 34daa50 to 1e52394 Compare August 16, 2021 13:49
@owais owais requested review from jmacd and Oberon00 August 18, 2021 14:36
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@tigrannajaryan
Copy link
Member

@owais please address the feedback.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 23, 2021
@github-actions
Copy link

github-actions bot commented Oct 1, 2021

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants