Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm 1.30.1 init fails in kubelet-check with timeout #3069

Closed
modcritical opened this issue May 31, 2024 · 10 comments
Closed

kubeadm 1.30.1 init fails in kubelet-check with timeout #3069

modcritical opened this issue May 31, 2024 · 10 comments
Assignees
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Milestone

Comments

@modcritical
Copy link

modcritical commented May 31, 2024

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.1", GitCommit:"6911225c3f747e1cd9d109c305436d08b668f086", GitTreeState:"clean", BuildDate:"2024-05-14T10:49:05Z", GoVersion:"go1.22.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version): 1.30.1
  • Cloud provider or hardware configuration: Self-hosted VM, Proxmox VE 8.1, Intel Skylake
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 12 (bookworm)
  • Kernel (e.g. uname -a): Linux k8ctl1 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux
  • Container runtime (CRI) (e.g. containerd, cri-o): containerd containerd.io 1.6.32 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
  • Container networking plugin (CNI) (e.g. Calico, Cilium): N/A
  • Others:

What happened?

I attempted to initialize the cluster, but the init command failed indicating a timeout connecting to http://localhost:10248/healthz. Notably, I can immediately run the provided curl command which results in an instant response of "ok".

root@k8ctl1:~# kubeadm init --config /root/kubeadm-config.yaml
[init] Using Kubernetes version: v1.30.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8ctl1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [fdff:ff71:2fc4:0:ffff::1 fdbc:6a5c:a49a:1302::b654:10f5]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8ctl1 localhost] and IPs [fdbc:6a5c:a49a:1302::b654:10f5 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8ctl1 localhost] and IPs [fdbc:6a5c:a49a:1302::b654:10f5 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is not healthy after 4m0.000220067s

Unfortunately, an error has occurred:
        The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' returned error: Get "http://localhost:10248/healthz": context deadline exceeded


This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
root@k8ctl1:~# curl -sSL http://localhost:10248/healthz
ok
root@k8ctl1:~#

What you expected to happen?

Successful cluster initialization ending in a join command for the worker nodes.

How to reproduce it (as minimally and precisely as possible)?

The VM for the first control node of this cluster was created from a Debian 12 template that is mostly a minimal Debian install plus cloud-init and some common utilities. The VM was prepared for K8s using Ansible plays developed from the initial docs. (install containerd and configure for systemd cgroup driver, set kernel params, add overlay and br_netfliter modules, etc.)

Containerd is installed from docker apt repo (containerd.io package)
Kubernetes components are installed from pkgs.k8s.io apt repo

VM is IPv6 single-stack, IPv4 Internet access via NAT64.

Rebooted VM and validated modules were loaded and kernel params were set.

Using config file:

kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.29.5
networking:
  podSubnet: fdff:ff71:2fc4:0::/64
  serviceSubnet: fdff:ff71:2fc4:0:ffff::/112
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd

Ran command kubeadm init --config /root/kubeadm-config.yaml

Anything else we need to know?

Possible Regression from 1.29.x

The issue is repeatable, and does not occur with kubeadm 1.29.5.

Other Details

Expected containers are running after the failure, and none are flapping:

CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
cf9ff0332663b       3861cfcd7c04c       2 hours ago         Running             etcd                      0                   b6087285962fb       etcd-k8ctl1.dev.omt.cx
66313e60d2481       2242ad7f7c41a       2 hours ago         Running             kube-controller-manager   0                   445a9b82697a2       kube-controller-manager-k8ctl1.dev.omt.cx
aec58504ab54e       b36112597a5f1       2 hours ago         Running             kube-apiserver            0                   175b79f0a972f       kube-apiserver-k8ctl1.dev.omt.cx
0fd56823578c4       e579eb50f57be       2 hours ago         Running             kube-scheduler            0                   405173d4752df       kube-scheduler-k8ctl1.dev.omt.cx

Cluster seems to be working after failed init

I was able to push kubeadm through the remainder of the phases and ended up with what I think is a normally working control node.

### Finishing after healthz error

kubeadm init phase upload-config all
kubeadm init phase upload-certs --upload-certs
kubeadm init phase mark-control-plane
kubeadm init phase bootstrap-token
kubeadm init phase kubelet-finalize
kubeadm init phase addon all --config /root/kubeadm-config.yaml

Then setting testing kubectl

mkdir ~/.kube
cp /etc/kubernetes/admin.conf ~/.kube/config

root@k8ctl1:~# kubectl cluster-info
Kubernetes control plane is running at https://[fdbc:6a5c:a49a:1302::b654:10f5]:6443
CoreDNS is running at https://[fdbc:6a5c:a49a:1302::b654:10f5]:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Get a token to join other nodes:

kubeadm token create --print-join-command

Attempted to join a worker node, where kubeadm also failed with the same error as it did on the control node. But, the worker appeared to be joined successfully despite the error.

System log from boot through attempted cluster init on control node:

kubeadm-init-fail4-system-journal.log

@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. kind/regression Categorizes issue or PR as related to a regression from a prior release. area/kubelet labels May 31, 2024
@neolit123
Copy link
Member

neolit123 commented May 31, 2024

thanks for the report

yes, there were some refactors in the area of kubelet health check in 1.30, however they are straightforward and only you have reported the problem. i cannot reproduce it, also we have an extensive e2e test suite and there are many projects built on top of kubeadm that have also not reported such an issue. my point is, we need to understand why it is failing in your setup.

the location of the failure is here:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/apiclient/wait.go#L271-L275

that is a standard go HTTP client, but it may have some differences with how curl does transport WRT timeouts and proxy for example.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/net/http.go#L107-L127

you can install wireshark and see what the differences are between the two requests.
my first though was the -L flag which tells curl to follow a localhost redirect (if you have that for some reason...) but the go client also has that by default. and a redirect on localhost will be odd..

@neolit123
Copy link
Member

neolit123 commented May 31, 2024

if you are able to build kubeadm from source you can try the following:

@neolit123 neolit123 added this to the v1.30 milestone May 31, 2024
@modcritical
Copy link
Author

modcritical commented May 31, 2024

Thanks for the direction on where to look in the source, I was able to determine what was causing the issue. Unfortunately I don't understand why it is working in the previous release.

This is failing in my setup because of how http.Client is resolving localhost. In wait.go WaitForKubelet(), http.Client.Do() is resolving the localhost part of the URL http://localhost:10248/healthz to ::1. The address of localhost is mapped to ::1 in /etc/hosts.

kubelet is only listening on 127.0.0.1:10248. Curl works because, as I just discovered, it treats localhost specially and uses an internally hard-coded value of 127.0.0.1 for it.

The KubeletConfiguration.HealthzBindAddress default value is literal string "127.0.0.1", but wait.go line 250 creates the endpoint URL with template string "http://localhost:%d/healthz". This causes the health check to never succeed on systems where localhost does not resolve to 127.0.0.1

The only relevant difference I see in this logic between 1.29.5 and 1.30.1 is using http.Client.Do() instead of http.Client.Get(). But, I wrote a trivial test that used both and the two functions had the same behavior: They resolved localhost as ::1 and failed to connect to a service listening on 127.0.0.1. I also tried older versions of go and observed the same behavior. So, currently I have no idea why the packaged kubeadm 1.29.5 worked on my reference build when 1.30.1 did not.

Three different fixes worked in my setup:

  • Change wait.go line 250 to use the string 127.0.0.1 in the URL instead of localhost, to match the default value from KubeletConfiguration.
  • Modify /etc/hosts to map localhost to 127.0.0.1
  • Add healthzBindAddress: '::1' to the KubeletConfiguration passed to kubeadm init

Probably if kubeadm is not going to build the healthz URL using the KubeletConfiguration that it is deploying, it should use the same values that KubeletConfiguration uses for defaults.

I updated my setup to resolve localhost to 127.0.0.1 and everything is working well like that. Thanks again for the pointers!

@neolit123
Copy link
Member

neolit123 commented Jun 1, 2024

thanks for testing and confirming the source of the problem.
here is a PR for 1.31:
kubernetes/kubernetes#125265

if that passes on CI the fix can be backported to 1.30.latest

@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. and removed kind/regression Categorizes issue or PR as related to a regression from a prior release. labels Jun 1, 2024
@neolit123
Copy link
Member

i will mark it as bug instead of regression, because we are not sure about this part.

the old code in 1.29 also hardcoded localhost
kubernetes/kubernetes@5571188#diff-6635417848a0e9a8b64420d4cf6671bf2ab6f7b27e6f2773749b16c9d2b24521L164
but used a client.Get(),
kubernetes/kubernetes@5571188#diff-6635417848a0e9a8b64420d4cf6671bf2ab6f7b27e6f2773749b16c9d2b24521L142
it's odd that it worked for 1.29 on your end.

@neolit123
Copy link
Member

fix will be available in 1.30.2
kubernetes/kubernetes#125286

@franklin-gaoxy
Copy link

hello

I also encountered a similar issue, it's address is: kubernetes/kubernetes#125275

I checked my DNS and this is its result:

root@kmaster1:~# nslookup localhost 
Server:		223.5.5.5
Address:	223.5.5.5#53

Name:	localhost
Address: 127.0.0.1
Name:	localhost
Address: ::1

At the same time, I also added the corresponding resolution in the /etc/hosts file

root@kmaster1:~# cat /etc/hosts
10.0.0.5	debian

# The following lines are desirable for IPv6 capable hosts
#::1     localhost ip6-localhost ip6-loopback
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
10.0.0.21 kmaster1
10.0.0.22 knode1
10.0.0.23 knode2
127.0.0.1	localhost

But now executing kubeadm init will still prompt:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
	cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1598
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
	vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
	vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
	vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
	cmd/kubeadm/app/kubeadm.go:50
main.main
	cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1598

I also checked kubelet

root@kmaster1:~# curl http://localhost:10248/healthz 
okroot@kmaster1:~# 

Hope to get your advice, thank you!

@neolit123
Copy link
Member

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

that seems like a separate problem.
you need to check kubelet logs with journalctl -xeu kubelet and find any errors (log lines starting with E0...).

https://github.com/kubernetes/kubeadm?tab=readme-ov-file#support

@franklin-gaoxy
Copy link

I've placed some relevant logs in the issues I raised, the general content is as follows.

Jun 02 13:46:00 kmaster1 kubelet[658]: I0602 13:46:00.418850     658 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Jun 02 13:46:00 kmaster1 kubelet[658]: I0602 13:46:00.439107     658 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"cgroupfs","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Jun 02 13:46:00 kmaster1 kubelet[658]: E0602 13:46:00.453782     658 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Jun 02 13:46:00 kmaster1 kubelet[658]: E0602 13:46:00.453866     658 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Jun 02 13:46:00 kmaster1 kubelet[658]: I0602 13:46:00.506290     658 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Jun 02 13:46:00 kmaster1 kubelet[658]: E0602 13:46:00.513551     658 kubelet.go:2327] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"
Jun 02 14:03:39 kmaster1 kubelet[1120]: I0602 14:03:39.312424    1120 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Jun 02 14:03:39 kmaster1 kubelet[1120]: I0602 14:03:39.316470    1120 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"cgroupfs","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Jun 02 14:03:39 kmaster1 kubelet[1120]: E0602 14:03:39.319191    1120 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Jun 02 14:03:39 kmaster1 kubelet[1120]: E0602 14:03:39.319211    1120 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Jun 02 14:03:39 kmaster1 kubelet[1120]: E0602 14:03:39.335951    1120 kubelet.go:2327] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Jun 02 14:03:39 kmaster1 kubelet[1120]: I0602 14:03:39.346537    1120 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Jun 02 14:26:58 kmaster1 kubelet[3511]: I0602 14:26:58.865189    3511 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Jun 02 14:26:58 kmaster1 kubelet[3511]: I0602 14:26:58.868954    3511 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"cgroupfs","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Jun 02 14:26:58 kmaster1 kubelet[3511]: E0602 14:26:58.871007    3511 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Jun 02 14:26:58 kmaster1 kubelet[3511]: E0602 14:26:58.871131    3511 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Jun 02 14:26:58 kmaster1 kubelet[3511]: I0602 14:26:58.883078    3511 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Jun 02 14:26:58 kmaster1 kubelet[3511]: E0602 14:26:58.887446    3511 kubelet.go:2327] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"
Jun 02 14:26:59 kmaster1 kubelet[3555]: I0602 14:26:59.084609    3555 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Jun 02 14:26:59 kmaster1 kubelet[3555]: I0602 14:26:59.090511    3555 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"cgroupfs","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Jun 02 14:26:59 kmaster1 kubelet[3555]: E0602 14:26:59.091925    3555 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Jun 02 14:26:59 kmaster1 kubelet[3555]: E0602 14:26:59.091981    3555 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Jun 02 14:26:59 kmaster1 kubelet[3555]: E0602 14:26:59.102172    3555 kubelet.go:2327] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Jun 02 14:26:59 kmaster1 kubelet[3555]: I0602 14:26:59.103552    3555 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Jun 02 14:28:03 kmaster1 kubelet[7909]: I0602 14:28:03.967366    7909 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Jun 02 14:28:03 kmaster1 kubelet[7909]: I0602 14:28:03.973456    7909 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"cgroupfs","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Jun 02 14:28:03 kmaster1 kubelet[7909]: E0602 14:28:03.974820    7909 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Jun 02 14:28:03 kmaster1 kubelet[7909]: E0602 14:28:03.974835    7909 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Jun 02 14:28:03 kmaster1 kubelet[7909]: E0602 14:28:03.979881    7909 kubelet.go:2327] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Jun 02 14:28:03 kmaster1 kubelet[7909]: I0602 14:28:03.987221    7909 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"

Should I reopen my issues and continue asking questions under them?

@neolit123
Copy link
Member

neolit123 commented Jun 5, 2024

we don't provide support on github @gy528909 so please don't open more issues in the kubernetes github org.

also this is not a kubeadm bug.
it seems the kubelet is not happy with the state of containerd. try a different k8s / containerd. use a cgroup v2 enabled os.

post in slack or discuss.k8s.io
https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md

ask in #sig-node or #containerd too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

3 participants