Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-cni in chaining mode with cilium on ipv6 EKS stuck in pending state #2391

Closed
sdomme opened this issue May 23, 2023 · 8 comments
Closed

Comments

@sdomme
Copy link

sdomme commented May 23, 2023

Environment:

  • Kubernetes version (use kubectl version): {Major:"1", Minor:"24+", GitVersion:"v1.24.13-eks-0a21954"
  • CNI Version: v1.12.6-eksbuild.1
  • OS (e.g: cat /etc/os-release): Bottlerocket OS 1.14.0 (aws-k8s-1.24)
  • Kernel (e.g. uname -a): 5.15.108

What happened:
We sometime encounter an issue in our EKS clusters where the cilium POD is stuck in pending state. We startup our nodes with Karpenter and It doesn't happen all the time. The node itself has a Ready condition. Due to the cilium POD doesn't come up properly, the whole network stack is broken. We were able to pinpoint the issue to the following circumstances comparing a healthy node with a broken one:
Cilium is not able to init the configuration due to an i/o timeout calling the EKS API IPv6 address. Looking into the environment with sheltie we can see the IPv6 address on the nodes eth0 interface is not properly configured:
bad node:

bash-5.1# wicked show all
lo              device-unconfigured
      link:     #1, state up
      type:     loopback
      addr:     ipv4 127.0.0.1/8
      addr:     ipv6 ::1/128

eth0            up
      link:     #2, state up, mtu 9001
      type:     ethernet, hwaddr 06:da:df:b6:a7:c6
      config:   wicked:xml:/etc/wicked/ifconfig/eth0.xml
      leases:   ipv4 dhcp granted
      leases:   ipv6 dhcp requesting
      addr:     ipv4 10.33.253.169/24 [dhcp]
      route:    ipv4 default via 10.33.253.1 [dhcp]
      route:    ipv6 default via fe80::4c3:eaff:fea5:d09e metric 1024 proto ra

There is no addr: ipv6 ...

good node:

bash-5.1# wicked show all
lo              device-unconfigured
      link:     #1, state up
      type:     loopback
      addr:     ipv4 127.0.0.1/8
      addr:     ipv6 ::1/128

eth0            up
      link:     #2, state up, mtu 9001
      type:     ethernet, hwaddr 0a:ff:ec:87:a6:e8
      config:   wicked:xml:/etc/wicked/ifconfig/eth0.xml
      leases:   ipv4 dhcp granted
      leases:   ipv6 dhcp granted
      addr:     ipv4 10.33.254.4/24 [dhcp]
      addr:     ipv6 2a05:d014:dad:dc05::9cc/64 [dhcp]
      route:    ipv4 default via 10.33.254.1 [dhcp]
      route:    ipv6 default via fe80::850:32ff:fe76:84 metric 1024 proto ra

Proper IPv6 address is configured addr: ipv6 2a05:d014:dad:dc05::9cc/64 [dhcp]

From the logs of ipamd we see that we get an IPv6 address:

bash-5.1# cat /var/log/aws-routed-eni/ipamd.log
{"level":"info","ts":"2023-05-23T03:22:24.990Z","caller":"logger/logger.go:52","msg":"Constructed new logger instance"}
{"level":"info","ts":"2023-05-23T03:22:24.990Z","caller":"eniconfig/eniconfig.go:61","msg":"Initialized new logger as an existing instance was not found"}
{"level":"info","ts":"2023-05-23T03:22:24.996Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD   ..."}
{"level":"info","ts":"2023-05-23T03:22:24.996Z","caller":"aws-k8s-agent/main.go:39","msg":"Testing communication with server"}
{"level":"info","ts":"2023-05-23T03:22:25.005Z","caller":"wait/wait.go:222","msg":"Successful communication with the Cluster! Cluster Version is: v1.24+. git version: v1.24.13-eks-0a21954. git tree state: clean. commit: 6305d65c340554ad8b4d7a5f21391c9fa34932cb. platform: linux/amd64"}
{"level":"warn","ts":"2023-05-23T03:22:25.030Z","caller":"awssession/session.go:64","msg":"HTTP_TIMEOUT env is not set or set to less than 10 seconds, defaulting to httpTimeout to 10sec"}
{"level":"debug","ts":"2023-05-23T03:22:25.032Z","caller":"ipamd/ipamd.go:392","msg":"Discovered region: eu-central-1"}
{"level":"info","ts":"2023-05-23T03:22:25.032Z","caller":"ipamd/ipamd.go:392","msg":"Custom networking enabled false"}
{"level":"debug","ts":"2023-05-23T03:22:25.033Z","caller":"awsutils/awsutils.go:431","msg":"Found availability zone: eu-central-1b "}
{"level":"debug","ts":"2023-05-23T03:22:25.034Z","caller":"awsutils/awsutils.go:431","msg":"Discovered the instance primary IPv4 address: 10.33.253.169"}
{"level":"debug","ts":"2023-05-23T03:22:25.034Z","caller":"awsutils/awsutils.go:431","msg":"Found instance-id: i-003b328e1d3d7022a "}
{"level":"debug","ts":"2023-05-23T03:22:25.035Z","caller":"awsutils/awsutils.go:431","msg":"Found instance-type: c6in.xlarge "}
{"level":"debug","ts":"2023-05-23T03:22:25.036Z","caller":"awsutils/awsutils.go:431","msg":"Found primary interface's MAC address: 06:da:df:b6:a7:c6"}
{"level":"debug","ts":"2023-05-23T03:22:25.036Z","caller":"awsutils/awsutils.go:431","msg":"eni-061d9f7ae8501b772 is the primary ENI of this instance"}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"awsutils/awsutils.go:431","msg":"Found subnet-id: subnet-01d241d9ad90c12b0 "}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:401","msg":"Using WARM_ENI_TARGET 1"}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:404","msg":"Using WARM_PREFIX_TARGET 1"}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:2285","msg":"Check if instance supports Prefix Delegation"}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"awsutils/awsutils.go:1472","msg":"Instance hypervisor family nitro"}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:2285","msg":"Instance supports Prefix Delegation"}
{"level":"info","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:422","msg":"Prefix Delegation enabled true"}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:427","msg":"Start node init"}
{"level":"info","ts":"2023-05-23T03:22:25.037Z","caller":"ipamd/ipamd.go:469","msg":"Setting up host network... "}
{"level":"info","ts":"2023-05-23T03:22:25.037Z","caller":"awsutils/awsutils.go:1684","msg":"Will attempt to clean up AWS CNI leaked ENIs after waiting 1m11s."}
{"level":"debug","ts":"2023-05-23T03:22:25.037Z","caller":"networkutils/network.go:281","msg":"Found the Link that uses mac address 06:da:df:b6:a7:c6 and its index is 2 (attempt 1/5)"}
{"level":"debug","ts":"2023-05-23T03:22:25.044Z","caller":"networkutils/network.go:357","msg":"Trying to find primary interface that has mac : 06:da:df:b6:a7:c6"}
{"level":"debug","ts":"2023-05-23T03:22:25.044Z","caller":"networkutils/network.go:357","msg":"Discovered interface: lo, mac: "}
{"level":"debug","ts":"2023-05-23T03:22:25.044Z","caller":"networkutils/network.go:357","msg":"Discovered interface: eth0, mac: 06:da:df:b6:a7:c6"}
{"level":"info","ts":"2023-05-23T03:22:25.044Z","caller":"networkutils/network.go:357","msg":"Discovered primary interface: eth0"}
{"level":"debug","ts":"2023-05-23T03:22:25.046Z","caller":"awsutils/awsutils.go:1172","msg":"Total number of interfaces found: 1 "}
{"level":"debug","ts":"2023-05-23T03:22:25.046Z","caller":"awsutils/awsutils.go:610","msg":"Found ENI MAC address: 06:da:df:b6:a7:c6"}
{"level":"debug","ts":"2023-05-23T03:22:25.048Z","caller":"awsutils/awsutils.go:610","msg":"Found ENI: eni-061d9f7ae8501b772, MAC 06:da:df:b6:a7:c6, device 0"}
{"level":"info","ts":"2023-05-23T03:22:25.263Z","caller":"ipamd/ipamd.go:475","msg":"Got network cardindex 0 for ENI eni-061d9f7ae8501b772"}
{"level":"info","ts":"2023-05-23T03:22:25.263Z","caller":"ipamd/ipamd.go:475","msg":"eni-061d9f7ae8501b772 is of type: interface"}
{"level":"debug","ts":"2023-05-23T03:22:25.263Z","caller":"ipamd/ipamd.go:427","msg":"DescribeAllENIs success: ENIs: 1, tagged: 1"}
{"level":"debug","ts":"2023-05-23T03:22:25.263Z","caller":"ipamd/ipamd.go:427","msg":"Discovered ENI eni-061d9f7ae8501b772, trying to set it up"}
{"level":"debug","ts":"2023-05-23T03:22:25.263Z","caller":"ipamd/ipamd.go:491","msg":"Tagging ENI eni-061d9f7ae8501b772 with missing tags: map[cluster.k8s.amazonaws.com/name:datacore-ingestion-poc node.k8s.amazonaws.com/instance_id:i-003b328e1d3d7022a]"}
{"level":"debug","ts":"2023-05-23T03:22:25.355Z","caller":"retry/retry.go:70","msg":"Successfully tagged ENI: eni-061d9f7ae8501b772"}
{"level":"debug","ts":"2023-05-23T03:22:25.355Z","caller":"ipamd/ipamd.go:1094","msg":"DataStore Add an ENI eni-061d9f7ae8501b772"}
{"level":"debug","ts":"2023-05-23T03:22:25.355Z","caller":"ipamd/ipamd.go:1105","msg":"Assigning an IPv6Prefix for ENI: eni-061d9f7ae8501b772"}
{"level":"debug","ts":"2023-05-23T03:22:25.407Z","caller":"ipamd/ipamd.go:1105","msg":"ENI eni-061d9f7ae8501b772 has 0 prefixe(s) attached"}
{"level":"debug","ts":"2023-05-23T03:22:25.407Z","caller":"ipamd/ipamd.go:1105","msg":"No IPv6 Prefix(es) found for ENI: eni-061d9f7ae8501b772"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1028","msg":"Allocated 1 private IPv6 prefix(es)"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1105","msg":"Successfully allocated an IPv6Prefix for ENI: eni-061d9f7ae8501b772"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1042","msg":"Updating datastore with IPv6Prefix(es) for ENI: eni-061d9f7ae8501b772, count: 1"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1183","msg":"Adding 2a05:d014:dad:dc04:1f09::/80 to DS for eni-061d9f7ae8501b772"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1183","msg":"ENI in pool %!s(bool=true)"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1183","msg":"IP not in  DS"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1183","msg":"Assigning IPv6CIDRs"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1183","msg":"Added ENI(eni-061d9f7ae8501b772)'s IP/Prefix 2a05:d014:dad:dc04:1f09::/80 to datastore"}
{"level":"debug","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:1190","msg":"Prefix pool stats: Total IPs/Prefixes = 281474976710656/1, AssignedIPs/CooldownIPs: 0/0, c.maxIPsPerENI = 0"}
{"level":"info","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:427","msg":"ENI eni-061d9f7ae8501b772 set up."}
{"level":"info","ts":"2023-05-23T03:22:25.682Z","caller":"ipamd/ipamd.go:522","msg":"Begin ipam state recovery from backing store"}
{"level":"debug","ts":"2023-05-23T03:22:25.683Z","caller":"ipamd/ipamd.go:522","msg":"backing store doesn't exists, assuming bootstrap on a new node"}
{"level":"info","ts":"2023-05-23T03:22:25.683Z","caller":"aws-k8s-agent/main.go:80","msg":"Serving RPC Handler version  on 127.0.0.1:50051"}
{"level":"info","ts":"2023-05-23T03:22:25.683Z","caller":"runtime/asm_amd64.s:1594","msg":"Serving metrics on port 61678"}
{"level":"info","ts":"2023-05-23T03:22:25.683Z","caller":"ipamd/introspect.go:62","msg":"Serving introspection endpoints on 127.0.0.1:61679"}
{"level":"info","ts":"2023-05-23T03:22:25.683Z","caller":"runtime/asm_amd64.s:1594","msg":"Setting up shutdown hook."}
{"level":"info","ts":"2023-05-23T03:22:26.615Z","caller":"rpc/rpc.pb.go:713","msg":"Received AddNetwork for NS /var/run/netns/cni-93db0d15-9d4b-1f88-2a19-afe836077447, Sandbox bdcb6a3aac4b9e2f0f098e55831506f56e5d5ef590b0fd966eb7b12b93133ac5, ifname eth0"}
{"level":"debug","ts":"2023-05-23T03:22:26.615Z","caller":"rpc/rpc.pb.go:713","msg":"AddNetworkRequest: K8S_POD_NAME:\"istio-cni-node-5sjpv\"  K8S_POD_NAMESPACE:\"kube-system\"  K8S_POD_INFRA_CONTAINER_ID:\"bdcb6a3aac4b9e2f0f098e55831506f56e5d5ef590b0fd966eb7b12b93133ac5\"  ContainerID:\"bdcb6a3aac4b9e2f0f098e55831506f56e5d5ef590b0fd966eb7b12b93133ac5\"  IfName:\"eth0\"  NetworkName:\"aws-cni\"  Netns:\"/var/run/netns/cni-93db0d15-9d4b-1f88-2a19-afe836077447\""}
{"level":"debug","ts":"2023-05-23T03:22:26.615Z","caller":"datastore/data_store.go:648","msg":"AssignIPv6Address: IPv6 address pool stats: assigned 0"}
{"level":"debug","ts":"2023-05-23T03:22:26.615Z","caller":"datastore/data_store.go:1333","msg":"Found a free IP not in DB - 2a05:d014:dad:dc04:1f09::"}
{"level":"debug","ts":"2023-05-23T03:22:26.615Z","caller":"datastore/data_store.go:677","msg":"Returning Free IP 2a05:d014:dad:dc04:1f09::"}
{"level":"debug","ts":"2023-05-23T03:22:26.615Z","caller":"datastore/data_store.go:648","msg":"New v6 IP from PD pool- 2a05:d014:dad:dc04:1f09::"}
{"level":"info","ts":"2023-05-23T03:22:26.615Z","caller":"datastore/data_store.go:688","msg":"AssignPodIPv4Address: Assign IP 2a05:d014:dad:dc04:1f09:: to sandbox aws-cni/bdcb6a3aac4b9e2f0f098e55831506f56e5d5ef590b0fd966eb7b12b93133ac5/eth0"}
{"level":"info","ts":"2023-05-23T03:22:26.616Z","caller":"rpc/rpc.pb.go:713","msg":"Received AddNetwork for NS /var/run/netns/cni-c406cb60-2de8-7cdf-1565-26353d008b8d, Sandbox c1cd5a364bbdeb8ea83d6994f94757ac8d69b0aa05862e314a6c19a4421bcc6c, ifname eth0"}
{"level":"debug","ts":"2023-05-23T03:22:26.616Z","caller":"rpc/rpc.pb.go:713","msg":"AddNetworkRequest: K8S_POD_NAME:\"ebs-csi-node-wwsmk\"  K8S_POD_NAMESPACE:\"kube-system\"  K8S_POD_INFRA_CONTAINER_ID:\"c1cd5a364bbdeb8ea83d6994f94757ac8d69b0aa05862e314a6c19a4421bcc6c\"  ContainerID:\"c1cd5a364bbdeb8ea83d6994f94757ac8d69b0aa05862e314a6c19a4421bcc6c\"  IfName:\"eth0\"  NetworkName:\"aws-cni\"  Netns:\"/var/run/netns/cni-c406cb60-2de8-7cdf-1565-26353d008b8d\""}
{"level":"debug","ts":"2023-05-23T03:22:26.616Z","caller":"datastore/data_store.go:648","msg":"AssignIPv6Address: IPv6 address pool stats: assigned 1"}
{"level":"debug","ts":"2023-05-23T03:22:26.616Z","caller":"datastore/data_store.go:1333","msg":"Found a free IP not in DB - 2a05:d014:dad:dc04:1f09::1"}
{"level":"debug","ts":"2023-05-23T03:22:26.616Z","caller":"datastore/data_store.go:677","msg":"Returning Free IP 2a05:d014:dad:dc04:1f09::1"}
{"level":"debug","ts":"2023-05-23T03:22:26.616Z","caller":"datastore/data_store.go:648","msg":"New v6 IP from PD pool- 2a05:d014:dad:dc04:1f09::1"}
{"level":"info","ts":"2023-05-23T03:22:26.616Z","caller":"datastore/data_store.go:688","msg":"AssignPodIPv4Address: Assign IP 2a05:d014:dad:dc04:1f09::1 to sandbox aws-cni/c1cd5a364bbdeb8ea83d6994f94757ac8d69b0aa05862e314a6c19a4421bcc6c/eth0"}
{"level":"debug","ts":"2023-05-23T03:22:26.616Z","caller":"rpc/rpc.pb.go:713","msg":"VPC V6 CIDR 2a05:d014:dad:dc00::/56"}
{"level":"info","ts":"2023-05-23T03:22:26.616Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr , IPv6Addr: 2a05:d014:dad:dc04:1f09::, DeviceNumber: 0, err: <nil>"}
{"level":"debug","ts":"2023-05-23T03:22:26.617Z","caller":"rpc/rpc.pb.go:713","msg":"VPC V6 CIDR 2a05:d014:dad:dc00::/56"}
{"level":"info","ts":"2023-05-23T03:22:26.617Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr , IPv6Addr: 2a05:d014:dad:dc04:1f09::1, DeviceNumber: 0, err: <nil>"}
{"level":"debug","ts":"2023-05-23T03:23:36.040Z","caller":"awsutils/awsutils.go:1684","msg":"Checking for leaked AWS CNI ENIs."}
{"level":"debug","ts":"2023-05-23T03:23:36.115Z","caller":"ec2/api.go:24068","msg":"EC2 DescribeNetworkInterfaces succeeded with 0 results on page 1"}
{"level":"debug","ts":"2023-05-23T03:23:36.115Z","caller":"awsutils/awsutils.go:1693","msg":"No AWS CNI leaked ENIs found."}
{"level":"info","ts":"2023-05-23T04:23:36.120Z","caller":"awsutils/awsutils.go:1684","msg":"Will attempt to clean up AWS CNI leaked ENIs after waiting 1m13s."}
{"level":"debug","ts":"2023-05-23T04:24:49.122Z","caller":"awsutils/awsutils.go:1684","msg":"Checking for leaked AWS CNI ENIs."}
{"level":"debug","ts":"2023-05-23T04:24:49.216Z","caller":"ec2/api.go:24068","msg":"EC2 DescribeNetworkInterfaces succeeded with 0 results on page 1"}
{"level":"debug","ts":"2023-05-23T04:24:49.216Z","caller":"awsutils/awsutils.go:1693","msg":"No AWS CNI leaked ENIs found."}
{"level":"info","ts":"2023-05-23T05:24:49.217Z","caller":"awsutils/awsutils.go:1684","msg":"Will attempt to clean up AWS CNI leaked ENIs after waiting 3m11s."}
{"level":"debug","ts":"2023-05-23T05:28:00.218Z","caller":"awsutils/awsutils.go:1684","msg":"Checking for leaked AWS CNI ENIs."}
{"level":"debug","ts":"2023-05-23T05:28:00.319Z","caller":"ec2/api.go:24068","msg":"EC2 DescribeNetworkInterfaces succeeded with 0 results on page 1"}
{"level":"debug","ts":"2023-05-23T05:28:00.319Z","caller":"awsutils/awsutils.go:1693","msg":"No AWS CNI leaked ENIs found."}
{"level":"info","ts":"2023-05-23T06:28:00.319Z","caller":"awsutils/awsutils.go:1684","msg":"Will attempt to clean up AWS CNI leaked ENIs after waiting 1m28s."}

We don't know which component sets the interface addresses in this whole chain, therefore we kindly ask for support. Thanks

BTW: restarting wicked on the Node resolves the issue. It finally gets an IPv6 address on the eth0 interface.

bash-5.1# systemctl restart wicked
bash-5.1# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 06:da:df:b6:a7:c6 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
    inet 10.33.253.169/24 brd 10.33.253.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2a05:d014:dad:dc04::5eb2/64 scope global tentative dynamic noprefixroute
       valid_lft 449sec preferred_lft 139sec
    inet6 fe80::4da:dfff:feb6:a7c6/64 scope link
       valid_lft forever preferred_lft forever
@jdn5126
Copy link
Contributor

jdn5126 commented May 23, 2023

@sdomme configuring an IPv6 address on the node's primary ENI is not something that the VPC CNI is responsible for. That should only happen when you create an IPv6 EKS cluster, unless this wicked tool is doing it. The fact that restarting wicked fixes this seems to point to it being the source of the issue.

It looks like this is an IPv6 cluster, so if you cannot point to wicked as the issue, you can also check the following sysctls on the working and non-working node:

net/ipv6/conf/eth0/disable_ipv6
net/ipv6/conf/eth0/forwarding
net/ipv6/conf/eth0/accept_ra

As a side note, chaining Cilium with the AWS VPC CNI is not something that EKS supports or tests, so you are operating in somewhat uncharted territory here.

Can you point to more documentation on wicked? I cannot find it anywhere from a Google search

@jdn5126 jdn5126 added the CNI chaining CNI plugin chaining label May 23, 2023
@sdomme
Copy link
Author

sdomme commented May 24, 2023

@jdn5126 Bottlerocket uses wicked to manage network. I never heard about wicked before. But this seems to be it
https://github.com/openSUSE/wicked Also Bottlerocket works on replacing it bottlerocket-os/bottlerocket#2449

I will provide the information from the sysctl when I get aanother one non-working node.

I do not understand how VPC CNI is not responsible for the IP addresses? Or you mean setting nodes (bottlerocket) IP addresse is already done before and VPC CNI ipam just manage ENIs subnets and IPs for the PODs? Do you also mean, the IP addresses (ipv4, ipv6) on the eth0 interface of the node should be already set before vpc-cni comes into play?

@sdomme
Copy link
Author

sdomme commented May 24, 2023

I compared the sysctl values. They are the same on a working and non-working node:

net.ipv6.conf.eth0.disable_ipv6 = 0
net.ipv6.conf.eth0.forwarding = 1
net.ipv6.conf.eth0.accept_ra = 2

@jdn5126
Copy link
Contributor

jdn5126 commented May 24, 2023

@sdomme the VPC CNI (actually IPAM) is responsible for making EC2 API calls to attach secondary ENIs, and an IPv4 or IPv6 subnet will get attached to the secondary ENI based on the mode. The primary ENI, eth0, is already configured before the aws-node pod runs. So if the primary ENI is sometimes missing an IPv6 address, and restarting wicked resolves it, I think we need to get some help from the bottlerocket team. Can you file an issue at https://github.com/bottlerocket-os/bottlerocket/issues and link this one?

I checked existing and closed bottlerocket issues and did not find any that matched this. I will follow the issue you file and comment if needed.

@sdomme
Copy link
Author

sdomme commented May 24, 2023

@jdn5126 Meanwhile I dig deeper and were able to pinpoint the issue to the wicked-dhcp6 service.
The issue at Bottlerocket side is created
bottlerocket-os/bottlerocket#3143

@jdn5126
Copy link
Contributor

jdn5126 commented May 24, 2023

@jdn5126 Meanwhile I dig deeper and were able to pinpoint the issue to the wicked-dhcp6 service. The issue at Bottlerocket side is created bottlerocket-os/bottlerocket#3143

Oh interesting, yeah I am hoping they are aware of this, as it sounds like it would be a pretty common issue

@sdomme
Copy link
Author

sdomme commented Jun 5, 2023

Hello @jdn5126 Since this issue has been confirmed at bottlerocket it can be closed here. Thanks for the support.

@sdomme sdomme closed this as completed Jun 5, 2023
@github-actions
Copy link

github-actions bot commented Jun 5, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants