Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull images failed #80

Closed
haozi4263 opened this issue Dec 2, 2022 · 8 comments
Closed

pull images failed #80

haozi4263 opened this issue Dec 2, 2022 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@haozi4263
Copy link

finch pull --platform=amd64 xxx

FATA[1167] failed to extract layer sha256:9cc8d31519b533c03cd8347147f9ea0b9bfbda4650200d388a1495a34812283f: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount3705620677: failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount3705620677/kubeflow/src" for UID 29511686, GID 1085706827: lchown /var/lib/containerd/tmpmounts/containerd-mount3705620677/kubeflow/src: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown
FATA[1168] exit status 1

@haozi4263 haozi4263 added the bug Something isn't working label Dec 2, 2022
@estesp
Copy link
Contributor

estesp commented Dec 2, 2022

Is this image public/shareable? This looks like an image that uses extremely large UIDs and/or GIDs, which when running rootless (or simply via a runtime with user namespaces enabled) means you have exhausted the (standard 2^16) ~65k range of UIDs/GIDs used to map filesystem ownership. I expect this image will not run on any rootless/user namespace-enabled container runtime, unless the /etc/sub{u,g}id files are created which allow a significant range of subordinate IDs to be used within containers.

I'm not quite sure what the value of using IDs in the very high range (that UID is somewhere above 2^24?; GID is even larger!) are, but if you own the image, I would be curious why the need for extremely large integers for the owner and group.

@haozi4263
Copy link
Author

image: ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01 is publish
use docker pull ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01 is ok

@ningziwen
Copy link
Member

ningziwen commented Dec 5, 2022

Reproduced in Finch.

FATA[0125] failed to extract layer sha256:9cc8d31519b533c03cd8347147f9ea0b9bfbda4650200d388a1495a34812283f: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount3084210000: failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount3084210000/kubeflow/src" for UID 29511686, GID 1085706827: lchown /var/lib/containerd/tmpmounts/containerd-mount3084210000/kubeflow/src: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown
FATA[0114] exit status 1

However, it worked with the nerdctl built from v1.0.0 tag, which is what we are using in Finch. Will continue the investigation

@estesp
Copy link
Contributor

estesp commented Dec 5, 2022

It's important compare nerdctl (or any other runtime tool) running the same way it is inside Finch, which based on the output is running inside a user namespace ("rootless" mode, specifically); the container shown will probably work on any container runtime that is not running the container within a user namespace (either "rootless" mode or simply inside a root-created user namespace with a specific range of subordinate uid and gids). If you use the nerdctl install that sets up rootless on a Linux system, you should be able to reproduce the same issue, unless you use an extremely large subordinate mapping for the ID ranges.

@ningziwen
Copy link
Member

Reproduced in nerdctl in finch VM shell.

FATA[0139] failed to extract layer sha256:9cc8d31519b533c03cd8347147f9ea0b9bfbda4650200d388a1495a34812283f: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount1146161846: failed to Lchown "/var/lib/containerd/tmpmounts/containerd-mount1146161846/kubeflow/src" for UID 29511686, GID 1085706827: lchown /var/lib/containerd/tmpmounts/containerd-mount1146161846/kubeflow/src: invalid argument (Hint: try increasing the number of subordinate IDs in /etc/subuid and /etc/subgid): unknown

@ningziwen
Copy link
Member

ningziwen commented Dec 6, 2022

Validated it can work after extending subuid and subgid.

[ningziwe@lima-finch ningziwe]$ cat /etc/subuid
ningziwe:100000:29700000
[ningziwe@lima-finch ningziwe]$ cat /etc/subgid
ningziwe:100000:1085800000
[ningziwe@lima-finch ningziwe]$
logout
➜  ~ finch pull ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01
...
elapsed: 339.7s                                                                   total:  942.4  (2.8 MiB/s)

Workaround:

# Log in VM shell
LIMA_HOME=/Applications/Finch/lima/data /Applications/Finch/lima/bin/limactl shell finch

# In VM shell, modify /etc/subuid and /etc/subgid to a larger number
sudo vi /etc/subuid
sudo vi /etc/subgid

# Logout VM shell and restart finch VM
finch vm stop
finch vm start

# Try to pull the image again
finch pull ccr.ccs.tencentyun.com/cube-studio/kubeflow-dashboard:2022.09.01

@ningziwen
Copy link
Member

As @estesp mentioned, the root cause is the image has extremely large UID/GID but the default number is 65536 in Finch.

I found a relevant issue in k8s. From the issue, 65536 is the default UID/GID number for most distributions and this issue is to fix the extremely large UID/GID in image side.

I suggest referring this issue and checking if the UID/GID of your image should/could be adjusted.

If you find it is necessary to use images with extremely large UID/GID, please elaborate the use case here. We can discuss making subuid/subgid configurable if the use case can be justified.

@ningziwen
Copy link
Member

The large uid/guid issue was resolved by switching to rootful container inside VM. #196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants