-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-cni in chaining mode with cilium on ipv6 EKS stuck in pending state #2391
Comments
@sdomme configuring an IPv6 address on the node's primary ENI is not something that the VPC CNI is responsible for. That should only happen when you create an IPv6 EKS cluster, unless this It looks like this is an IPv6 cluster, so if you cannot point to
As a side note, chaining Cilium with the AWS VPC CNI is not something that EKS supports or tests, so you are operating in somewhat uncharted territory here. Can you point to more documentation on |
@jdn5126 Bottlerocket uses wicked to manage network. I never heard about wicked before. But this seems to be it I will provide the information from the sysctl when I get aanother one non-working node. I do not understand how VPC CNI is not responsible for the IP addresses? Or you mean setting nodes (bottlerocket) IP addresse is already done before and VPC CNI ipam just manage ENIs subnets and IPs for the PODs? Do you also mean, the IP addresses (ipv4, ipv6) on the eth0 interface of the node should be already set before vpc-cni comes into play? |
I compared the sysctl values. They are the same on a working and non-working node:
|
@sdomme the VPC CNI (actually IPAM) is responsible for making EC2 API calls to attach secondary ENIs, and an IPv4 or IPv6 subnet will get attached to the secondary ENI based on the mode. The primary ENI, I checked existing and closed bottlerocket issues and did not find any that matched this. I will follow the issue you file and comment if needed. |
@jdn5126 Meanwhile I dig deeper and were able to pinpoint the issue to the wicked-dhcp6 service. |
Oh interesting, yeah I am hoping they are aware of this, as it sounds like it would be a pretty common issue |
Hello @jdn5126 Since this issue has been confirmed at bottlerocket it can be closed here. Thanks for the support. |
|
Environment:
kubectl version
): {Major:"1", Minor:"24+", GitVersion:"v1.24.13-eks-0a21954"cat /etc/os-release
): Bottlerocket OS 1.14.0 (aws-k8s-1.24)uname -a
): 5.15.108What happened:
We sometime encounter an issue in our EKS clusters where the cilium POD is stuck in pending state. We startup our nodes with Karpenter and It doesn't happen all the time. The node itself has a Ready condition. Due to the cilium POD doesn't come up properly, the whole network stack is broken. We were able to pinpoint the issue to the following circumstances comparing a healthy node with a broken one:
Cilium is not able to init the configuration due to an i/o timeout calling the EKS API IPv6 address. Looking into the environment with sheltie we can see the IPv6 address on the nodes
eth0
interface is not properly configured:bad node:
There is no
addr: ipv6 ...
good node:
Proper IPv6 address is configured
addr: ipv6 2a05:d014:dad:dc05::9cc/64 [dhcp]
From the logs of ipamd we see that we get an IPv6 address:
We don't know which component sets the interface addresses in this whole chain, therefore we kindly ask for support. Thanks
BTW: restarting wicked on the Node resolves the issue. It finally gets an IPv6 address on the
eth0
interface.The text was updated successfully, but these errors were encountered: