Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"container.id" resource attribute is wrong #1114

Closed
liurui-1 opened this issue Sep 13, 2023 · 12 comments · Fixed by open-telemetry/opentelemetry-php-contrib#191
Closed

"container.id" resource attribute is wrong #1114

liurui-1 opened this issue Sep 13, 2023 · 12 comments · Fixed by open-telemetry/opentelemetry-php-contrib#191
Labels
bug Something isn't working

Comments

@liurui-1
Copy link

liurui-1 commented Sep 13, 2023

Describe your environment Describe any aspect of your environment relevant to the problem, including your php version (php -v will tell you your current version), version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on master.

Steps to reproduce
We are using OpenTelemetry official demo in K8S:
https://opentelemetry.io/docs/demo/kubernetes-deployment/
The quoteservice is using php instrumentation.

What is the expected behavior?
We want to see the "container.id" resource attribute of the pod for quoteservice in my environment is 78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075`

This is reported by command: "kubectl describe pod my-otel-demo-quoteservice-xxxx" and other APM tools.

What is the actual behavior?
The "container.id" resource attribute reported by PHP OTel SDK is "c3e91a49d35e91f5ef2f672307a9ea830dbb80c488501ce2a6acb1cec7ee7b17"

Additional context
There are 2 system files which can be used to get container id from within a container. They are "/proc/self/cgroup" and "/proc/self/mountinfo". I think the PHP library may just get the ID from a wrong place.
Here is the data got from my env:

# kubectl exec my-otel-demo-quoteservice-6f6b978794-547fj -- cat /proc/self/cgroup
13:devices:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
12:pids:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
11:perf_event:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
10:freezer:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
9:cpuset:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
8:hugetlb:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
7:rdma:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
6:misc:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
5:memory:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
4:cpu,cpuacct:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
3:blkio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
2:net_cls,net_prio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope
1:name=systemd:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope


# kubectl exec my-otel-demo-quoteservice-6f6b978794-547fj -- cat /proc/self/mountinfo
4129 4679 0:362 / / rw,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c741,c758",lowerdir=/var/lib/containers/storage/overlay/l/CAURQBLP57ZJVKW2XK6RBHT2XT:/var/lib/containers/storage/overlay/l/BQSRYGKVUSPJYIXAYN3GRE7XV7:/var/lib/containers/storage/overlay/l/CLWTUDGVBTZKFU4Z675GXFXPMP:/var/lib/containers/storage/overlay/l/2U5ZCXUA56ANPDPHPEB6VYJPR4:/var/lib/containers/storage/overlay/l/M6E2CTY762WRFGWH3TS2P6ONMD:/var/lib/containers/storage/overlay/l/4PEGLBUJXYLUCBREJI6GVUYQY6:/var/lib/containers/storage/overlay/l/HO67AHLDHKU4BGVDQQIRHMWEIH:/var/lib/containers/storage/overlay/l/NPFKF3ZU7XJ5LU6RHADCRHERJ2:/var/lib/containers/storage/overlay/l/ENQKBKRWF333MJDVWX3QHU5BHE:/var/lib/containers/storage/overlay/l/KK6VKRFTFFOXJVKY3FY4QCE6TK:/var/lib/containers/storage/overlay/l/7UKV7RCONDCIVQ5R4KH7WF3EPX:/var/lib/containers/storage/overlay/l/S7FQJ3B7RMYLQBRMPF7F7OJ6IH:/var/lib/containers/storage/overlay/l/EP6A4UINXQEEWUQBHQVSQAW2OQ:/var/lib/containers/storage/overlay/l/ZROVLOOKVI4CLXUR6S3KTNT4JU,upperdir=/var/lib/containers/storage/overlay/713fceffbb83b19b98c6cd59996bed7719bddc1b86f9b7e6d85ceb0c2d519da4/diff,workdir=/var/lib/containers/storage/overlay/713fceffbb83b19b98c6cd59996bed7719bddc1b86f9b7e6d85ceb0c2d519da4/work,volatile
4130 4129 0:383 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
4131 4129 0:384 / /dev rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c741,c758",size=65536k,mode=755,inode64
4132 4131 0:385 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,context="system_u:object_r:container_file_t:s0:c741,c758",gid=5,mode=620,ptmxmode=666
4133 4131 0:316 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel
4134 4129 0:386 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro,seclabel
4135 4134 0:387 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c741,c758",mode=755,inode64
4136 4135 0:27 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
4137 4135 0:30 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,seclabel,net_cls,net_prio
4138 4135 0:31 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,seclabel,blkio
4139 4135 0:32 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,seclabel,cpu,cpuacct
4140 4135 0:33 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,seclabel,memory
4141 4135 0:34 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/misc ro,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,seclabel,misc
4142 4135 0:35 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,seclabel,rdma
4143 4135 0:36 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,seclabel,hugetlb
4144 4135 0:37 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,seclabel,cpuset
4145 4135 0:38 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,seclabel,freezer
4146 4135 0:39 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,seclabel,perf_event
4147 4135 0:40 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,seclabel,pids
4148 4135 0:41 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod542985b6_0726_4d82_bf85_36e327b167b3.slice/crio-78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075.scope /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:21 - cgroup cgroup rw,seclabel,devices
4149 4131 0:315 / /dev/shm rw,nosuid,nodev,noexec,relatime master:1067 - tmpfs shm rw,context="system_u:object_r:container_file_t:s0:c741,c758",size=65536k,inode64
4150 4129 0:25 /containers/storage/overlay-containers/c3e91a49d35e91f5ef2f672307a9ea830dbb80c488501ce2a6acb1cec7ee7b17/userdata/resolv.conf /etc/resolv.conf rw,nosuid,nodev,noexec master:28 - tmpfs tmpfs rw,seclabel,size=6418664k,nr_inodes=819200,mode=755,inode64
4151 4129 0:25 /containers/storage/overlay-containers/c3e91a49d35e91f5ef2f672307a9ea830dbb80c488501ce2a6acb1cec7ee7b17/userdata/hostname /etc/hostname rw,nosuid,nodev master:28 - tmpfs tmpfs rw,seclabel,size=6418664k,nr_inodes=819200,mode=755,inode64
4152 4129 0:25 /containers/storage/overlay-containers/c3e91a49d35e91f5ef2f672307a9ea830dbb80c488501ce2a6acb1cec7ee7b17/userdata/.containerenv /run/.containerenv rw,nosuid,nodev master:28 - tmpfs tmpfs rw,seclabel,size=6418664k,nr_inodes=819200,mode=755,inode64
4153 4129 252:4 /ostree/deploy/rhcos/var/lib/kubelet/pods/542985b6-0726-4d82-bf85-36e327b167b3/etc-hosts /etc/hosts rw,relatime - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
4154 4131 252:4 /ostree/deploy/rhcos/var/lib/kubelet/pods/542985b6-0726-4d82-bf85-36e327b167b3/containers/quoteservice/5e513aa6 /dev/termination-log rw,relatime - xfs /dev/vda4 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
4155 4129 0:25 /containers/storage/overlay-containers/78ea929aa43e7b71f7c36583d82038d92a76800bf5da9b8850e8bd7b514bc075/userdata/run/secrets /run/secrets rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,size=6418664k,nr_inodes=819200,mode=755,inode64
4156 4155 0:310 / /run/secrets/kubernetes.io/serviceaccount ro,relatime - tmpfs tmpfs rw,seclabel,size=40960k,inode64
4157 4130 0:383 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
4158 4130 0:383 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
4159 4130 0:383 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
4160 4130 0:383 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
4161 4130 0:383 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
4162 4130 0:388 / /proc/acpi ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c741,c758",inode64
4163 4130 0:384 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c741,c758",size=65536k,mode=755,inode64
4164 4130 0:384 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c741,c758",size=65536k,mode=755,inode64
4165 4130 0:384 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c741,c758",size=65536k,mode=755,inode64
4166 4130 0:389 / /proc/scsi ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c741,c758",inode64
4167 4134 0:390 / /sys/firmware ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c741,c758",inode64
@liurui-1 liurui-1 added the bug Something isn't working label Sep 13, 2023
@brettmc
Copy link
Collaborator

brettmc commented Sep 13, 2023

I think if it found a container id, it was by accident. Container detection is only supposed to work for docker containers: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.25.0/specification/resource/semantic_conventions/container.md and so it's probably a mistake that it's enabled in the demo at all. I can see that it's gotten the value from mountinfo, because it found a pattern that matches how docker does it.

Looking at the output you've provided, it does look like we could work it out though. Is there any prior art for how other SIGs/APM implementations work it out? How reliable is crio-{containerid}.scope? What about other container runtime implementations? And, can we tell if we're in k8s vs docker from examining available info in the system?

My thoughts: the detector should be removed from the demo, or we should try to improve the detector so that it works in k8s.

@liurui-1
Copy link
Author

liurui-1 commented Sep 14, 2023

Hi @brettmc ,
I am working on a monitoring tool to read the OTel data. I tried "https://github.com/open-telemetry/opentelemetry-demo/blob/main/kubernetes/opentelemetry-demo.yaml". For those OTel language SDKs who can report "container.id" resource attributes:

  • java, dotnet, and golang are doing well.
  • nodejs has a small defect.
  • php has a defect to use a wrong container id.

There are 2 versions of cgroup using the 2 system files as I mentioned in previous post. For example, Java is doing well with following code which you can refer to:
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResource.java#L54-L60
I am not developer of PHP so that I cannot make a PR for it.

@brettmc
Copy link
Collaborator

brettmc commented Sep 14, 2023

Our container detector was originally based on Java's implementation, so that makes it a bit easier.
What do you make of this:
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/test/java/io/opentelemetry/instrumentation/resources/CgroupV2ContainerIdExtractorTest.java#L55 (expected container id)
vs
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/test/resources/podman_proc_self_mountinfo#L14

In this case, the "correct" container id according to their tests is from the line in mountinfo containing hostname, and applying that logic to the data you provided, would mean that the correct container id is c3e91a49d35e91f5ef2f672307a9ea830dbb80c488501ce2a6acb1cec7ee7b17 ??

@liurui-1
Copy link
Author

liurui-1 commented Sep 14, 2023

Hi @brettmc ,
The way to get container.id from within the container is just a workaround solution. There is official way to get correct container.id from outside the container. K8S is using this official way,
I found all code implementations (to get container.id from within the container) which match the report by K8S is using the way to check V1 (/proc/self/cgroup) first and then V2 (/proc/self/mountinfo).
java, dotnet, golang, nodejs SDKs are doing the same way.
So, if you follow https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResource.java#L54-L60 which has 2 steps, you can get the right container.id at lease in all my environments.

@brettmc
Copy link
Collaborator

brettmc commented Sep 14, 2023

I understand now. A few things have changed since I originally copied Java's implementation - we were checking V2 then V1, but that's since been reversed. I've got a PR in to fix this, do you know how to test it?

@liurui-1
Copy link
Author

@brettmc If there is no need to compile and build, I can test the code change in my environments.

@brettmc
Copy link
Collaborator

brettmc commented Sep 14, 2023

Hmm, yeah. If you could work out how to replace Container.php with the source from the linked PR/branch, that would do it I think.

@liurui-1
Copy link
Author

liurui-1 commented Sep 14, 2023

Hi @brettmc , where can I find the PR you mentioned to replace Container.php? I can test it in my K8S environment to see if it works.
Is it this PR: open-telemetry/opentelemetry-php-contrib#191 ?
I have seen there are 10 files changed. Do I just need to change "Container.php"?

@brettmc
Copy link
Collaborator

brettmc commented Sep 14, 2023

This file: https://github.com/brettmc/opentelemetry-php-contrib/blob/container-detector-k8s/src/ResourceDetectors/Container/src/Container.php

@liurui-1
Copy link
Author

liurui-1 commented Sep 15, 2023

Hi @brettmc ,
I am sorry for late response because some matters.
I have verifies the above file of code change can generate the right "container.id" resource attribute in my K8S environments.
I tested with "https://opentelemetry.io/docs/demo/kubernetes-deployment/". I just replaced the Container.php file in the image, and found it was working well.

Note: In the OTel demo of "https://opentelemetry.io/docs/demo/kubernetes-deployment/", the Container.php file is located in "/var/www/vendor/open-telemetry/sdk/Resource/Detectors/Container.php". So that the name space is

namespace OpenTelemetry\SDK\Resource\Detectors;

In " https://github.com/brettmc/opentelemetry-php-contrib/blob/container-detector-k8s/src/ResourceDetectors/Container/src/Container.php", the namespace is:

namespace OpenTelemetry\Contrib\Resource\Detector\Container;

So, I just used the namespace of the demo and copied all other code.

@brettmc
Copy link
Collaborator

brettmc commented Sep 15, 2023

That's great news! Yes, sorry, the container detector was moved out of the SDK to be spec-compliant, which is why the namespace was different.
Thanks for testing and confirming the fix, I'll merge it and get it out to the demo app.

@brettmc
Copy link
Collaborator

brettmc commented Sep 16, 2023

@liurui-1 this has been added to the demo app via open-telemetry/opentelemetry-demo#1114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants