Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use default containerd address in pull-sandbox-image.sh #823

Merged
merged 1 commit into from
Dec 6, 2021
Merged

Use default containerd address in pull-sandbox-image.sh #823

merged 1 commit into from
Dec 6, 2021

Conversation

cartermckinnon
Copy link
Member

Issue #, if available:

Fixes #784.

Description of changes:

Remove explicit containerd address in pull-sandbox-image.sh, causing default /run/containerd/containerd.sock to be used. For future reference, this is also configurable via $CONTAINERD_ADDRESS.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@cartermckinnon cartermckinnon changed the title Use default containerd address. Use default containerd address in pull-sandbox-image.sh Dec 2, 2021
Copy link
Contributor

@ravisinha0506 ravisinha0506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add details on how this change is validated?

@cartermckinnon
Copy link
Member Author

cartermckinnon commented Dec 3, 2021

I reproduced the issue by creating a node on the latest AMI, configured to use containerd via the --container-runtime flag on bootstrap.sh, then rebooted it. Kubelet isn't started because sandbox-image is in a crash loop. Modifying /etc/eks/containerd/pull-sandbox-image.sh to either use --address=/run/containerd/containerd.sock or no --address flag at all resolves the issue because, for ctr:

--address value, -a value address for containerd's GRPC server (default: "/run/containerd/containerd.sock") [$CONTAINERD_ADDRESS]

This PR introduced the symlink in question. The author mentions that aws-cni-plugin still expects the dockershim socket to exist (the docs agree). So, it would be affected by a reboot in the same way. I believe we could move the symlink to /var/run/dockershim.sock to resolve that, but I don't know how to verify the CNI piece.

@cartermckinnon
Copy link
Member Author

A bit more background:

  • When pull-sandbox-image.sh was created, containerd ran on /run/dockershim.sock.
  • Later, when the symlink was created, we switched containerd to /var/containerd/containerd.sock, and pull-sandbox-image.sh should have been updated to use /run/containerd/containerd.sock.

@ravisinha0506
Copy link
Contributor

Change lgtm.

@cartermckinnon
Copy link
Member Author

As an additional sanity check, I created a simple nginx pod with:

kubectl apply -f https://k8s.io/examples/pods/simple-pod.yaml

and it was scheduled and started successfully. I was able to port-forward it and curl the nginx default page.

@cartermckinnon cartermckinnon removed the request for review from prasad0896 December 6, 2021 21:52
@ravisinha0506 ravisinha0506 merged commit 077fb18 into awslabs:master Dec 6, 2021
@cartermckinnon cartermckinnon deleted the sandbox-image/sh-bug branch December 6, 2021 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ContainerD - Kubelet service failed to start when Node is rebooted
2 participants