Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-node cluster creation fails at "Joining worker nodes" on Fedora/RHEL if aardvark-dns is not installed #3450

Open
neverpanic opened this issue Dec 13, 2023 · 1 comment

Comments

@neverpanic
Copy link

Multi-node kind clusters rely on working container name resolution to talk to their control plane node. On Fedora 39 and RHEL 9.2 installing podman does not automatically install the aardvark-dns package, which is required for DNS resolution of other containers on the same podman network.

Podman does emit a warning for this, but does not actually fail creation of the container:

[root@rhel-9-2-0-eus ~]# podman run --rm --network kind --name node1 -it registry.fedoraproject.org/fedora:39
WARN[0000] aardvark-dns binary not found, container dns will not be enabled

This warning is not visible when starting a cluster with kind:

[root@rhel-9-2-0-eus kind]# ./kind create cluster --config ~/multinode.yml --name multi18
enabling experimental podman provider
Creating cluster "multi18" ...
 ✓ Ensuring node image (kindest/node:v1.26.3) 🖼
 ✓ Preparing nodes 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✗ Joining worker nodes 🚜
Deleted nodes: ["multi18-worker" "multi18-control-plane" "multi18-worker2"]
ERROR: failed to create cluster: failed to join node with kubeadm: command "podman exec --privileged multi18-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
Command Output: I1212 15:09:22.836662     117 join.go:408] [preflight] found NodeName empty; using OS hostname as NodeName
I1212 15:09:22.836774     117 joinconfiguration.go:76] loading configuration from "/kind/kubeadm.conf"
I1212 15:09:22.837409     117 controlplaneprepare.go:225] [download-certs] Skipping certs download
I1212 15:09:22.837421     117 join.go:525] [preflight] Discovering cluster-info
I1212 15:09:22.837428     117 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "multi18-control-plane:6443"
I1212 15:09:30.871760     117 round_trippers.go:553] GET https://multi18-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s  in 8031 milliseconds
I1212 15:09:30.875868     117 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://multi18-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp: lookup multi18-con
trol-plane on 10.0.2.3:53: server misbehaving
[…]
Get "https://multi18-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp: lookup multi18-control-plane on 10.0.2.3:53: server misbehaving
couldn't validate the identity of the API Server
k8s.io/kubernetes/cmd/kubeadm/app/discovery.For
        cmd/kubeadm/app/discovery/discovery.go:45
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).TLSBootstrapCfg
        cmd/kubeadm/app/cmd/join.go:526
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).InitCfg
        cmd/kubeadm/app/cmd/join.go:536
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.getKubeletStartJoinData
        cmd/kubeadm/app/cmd/phases/join/kubelet.go:91
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runKubeletStartJoinPhase
        cmd/kubeadm/app/cmd/phases/join/kubelet.go:106
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1040
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:968
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1594
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1040
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:968
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1594

multinode.yml is the example from the docs:

[root@rhel-9-2-0-eus kind]# cat ~/multinode.yml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker

Installing the aardvark-dns package fixes joining the cluster worker nodes. This may be worth documenting in the known issues at https://kind.sigs.k8s.io/docs/user/known-issues.

@aojea
Copy link
Contributor

aojea commented Dec 18, 2023

these kind of things are the ones that podman is still in experimental mode #1778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants