What are consequences of disabling the CNI plugin in EKS? #176

iamakulov · 2018-09-11T22:19:26Z

I need to overcome the secondary IPs limit that EC2 worker nodes have.

I tried disabling the CNI plugin on my worker node:

sudo sed -i '/--network-plugin=cni/d' /etc/systemd/system/kubelet.service
sudo systemctl daemon-reload
sudo systemctl restart kubelet.service

and it appears that everything is working fine (pods seem to work as intended, ClusterIP and LoadBalancer services successfully resolve to pods).

Are there any non-obvious bad consequences of disabling the CNI plugin in EKS?

TigerC10 · 2018-09-26T00:22:08Z

@rorychatt wrote a great article explaining what benefits Amazon VPC CNI provides: https://www.contino.io/insights/kubernetes-is-hard-why-eks-makes-it-easier-for-network-and-security-architects

The shortlist is, without AWS VPC CNI Plugin:

AWS VPC Flow Logs won't work (these logs will show the traffic between ec2 worker nodes, but not between pods)
AWS Security Groups won't work (you won't be able to assign policies to pods directly)
Using an overlay network means having that extra bit of "hidden" infrastructure to debug in the event of failure

If at all possible, upgrade your worker node instance type to increase your capacity. Arguably the best option, since you want to run more containers per worker node anyway.

But if not possible, suggest you use CNI-Genie plugin to overcome your issue. This plugin allows you to annotate the pods with your CNI of choice, effectively letting you pick and choose which ones will allocate an IP in your VPC and which ones will allocate through some other overlay network.

iamakulov · 2018-09-26T00:44:54Z

Thanks a lot!

If at all possible, upgrade your worker node instance type to increase your capacity. Arguably the best option, since you want to run more containers per worker node anyway.

The issue for us is pricing. We run a large number of similar resource-light apps (thing e.g. app hosting), and we want to keep each pod within a specific budget. Larger nodes allow hosting more pods but are also more expensive, so the price-per-pod stays the same (too high).

In fact, during further experiments, it turned out that disabling the CNI plugin works perfectly for a single node, but makes Kubernetes behave weirdly when there’re multiple nodes:

Kubernetes starts assigning the same IPs to different pods on different nodes (so, pod A running on node 1 and pod B running on pod 2 might both have a 172.17.0.3 IP):

DNS stops working on all nodes except the first (looks like the DNS server IP is only resolvable from the first node)

This is weird because it looks like Kubernetes should enable the default network plugin which should ensure IPs are unique per cluster (I might be wrong though, I’m not experienced with Kubernetes):

The kubelet has a single default network plugin, and a default network common to the entire cluster.
— Docs

Have you ever heard of this? Might a different non-default CNI plugin (probably even picked with CNI-Genie) solve this?

TigerC10 · 2018-09-26T01:52:23Z

I know that kubenet only works well for single node clusters, it contains no cross-node capabilities due to its exclusive use of host-local.

Network plugins in Kubernetes come in a few flavors:

CNI plugins: adhere to the appc/CNI specification, designed for interoperability.

Kubenet plugin: implements basic cbr0 using the bridge and host-local CNI plugins
— Docs

And yeah, if you want to install flannel, Calico, or Weave then you'll create an overlay network instead. Nothing wrong with that, you just lose the ease of use that AWS VPC CNI provides.

EDIT:
This is an interesting article about running calico along side AWS VPC CNI... Though it does talk about only deploying calico for the networking policy capabilities.
https://aws.amazon.com/blogs/opensource/networking-foundation-eks-aws-cni-calico/

seeruk · 2018-10-26T15:01:06Z

Will there be a way of automating opting out of using the AWS VPC CNI plugin? I've been setting up and EKS cluster and was surprised when I hit the IP limit, but you can hit it so quickly. I have 2 applications running in my cluster, just 2, after installing the daemon sets for other bits and bats that are cluster-wide (Istio, Cert Manager, Fluentd for Elasticsearch, etc.).

This IP limit is quite restrictive on smaller machines, and I'd rather just be able to avoid it and focus on CPU / memory limits being the thing I have to worry about in general.

Edit: I think a big part of my issue may have been to do with using t3 instances there, t2 ones seem to be working better, and I have been able to schedule more pods - I think there was an issue there where K8s thought that the nodes had more pod availability but the CNI plugin was unable to assign an address for some reason.

TigerC10 · 2018-10-26T16:58:54Z

@seeruk the cloud formation template (or terraform, or kops, or whatever) you used to create your EKS cluster specifies the worker node AMI to use on nodes in the cluster. If you make your own AMI, you can install whatever CNI you want to "opt-out" and create an overlay network instead.

iamakulov · 2018-11-01T15:07:11Z

To update on my situation: in the end, I switched from EKS to a cluster created with kops.

I tried to make EKS work with kube-router (as a simple solution), and it worked, but kube-router depends on having kube-apiserver running with --allow-privileged=true, and I can’t control this in EKS. This means this solution could’ve broken at any EKS masters update.

Also, using kops instead of EKS allowed us to make our staging cluster cheaper (by using a single t2.micro for the master instead of the whole high-availability EKS solution).

kops has built-in support for kube-router, flannel, Calico, and a bunch of other networking solutions.

VladimirSmogitel · 2018-12-05T17:46:53Z

@iamakulov
Sorry to bother you:
Could you confirm that we create a cluster with cops we do not need to use AWS CNI plugin.
I am new to kubernetes and we deploy out cluster on AWS using a terraform script written by someone else, not me. Could you give any details with regards to how you handled container networking after switching from EKS to kops? Thank you.
It looks like our script deploys the cluster using kops but it still uses AWS CNI plugin (I am assuming that because we have aws-node pods running in the namespace kube-system when the cluster comes up).

angelbarrera92 · 2018-12-11T06:13:42Z

Hi @VladimirSmogitel
Creating a kubernetes cluster in aws does not need to have installed the AWS CNI. This network interface implementation came from AWS and has some good capabilities. Other CNI implementation like flannel or calico... Has other good capabilities. You have to know them and choose the one that makes your requirements pass :)

I have deployed multiple kubernetes cluster on aws using kops and weave as CNI

Thanks

lgg42 · 2019-02-19T18:31:13Z

@TigerC10 have you installed the CNI-Genie in EKS? it looks like a very good solution to this problem, even a nice feature to have!, but after seeing in the deployment yaml that genie pods will be running in master nodes.. I have my doubts....

TigerC10 · 2019-02-19T18:40:29Z

Yes, I eventually got it working by using cni-genie and installing/configuring it to the master node before creating the ec2 worker nodes. It's not the best solution (not as easily automatable), but it's functional.

lgg42 · 2019-02-19T18:49:54Z

OK, my mind just blew, I thought EKS's master nodes were a black box,
how exactly did you proceed? after getting a running EKS and .kube/config configured did you run kubectl apply -f https://raw.githubusercontent.com/Huawei-PaaS/CNI-Genie/master/conf/1.8/genie-plugin.yaml or... something else?

TigerC10 · 2019-02-19T20:54:59Z

Yeah, even when there are no nodes registered to the master node cluster - you can still run kubectl commands. So, if you apply the genie plugin, it'll create the 00-genie cni plugin which will take precedent on the worker nodes. I would still recommend making a custom worker node image, with the additional CNI plugins pre-installed... so as to minimize risk of failure. But the most important part is making sure that the master node has the plugin before joining the worker nodes to the cluster.

If the worker nodes join the cluster before the master node has the additional CNI plugins, then you will have to delete the worker nodes and add them back in to recognize the alternate CNI plugins on the worker nodes.

lgg42 · 2019-02-19T21:32:03Z

Man, this is GOLD! Thank you so much! I'm definitely gonna give it a try now!

tabern · 2019-03-05T00:37:01Z

Closing - question answered

lgg42 · 2019-03-06T22:50:23Z

@TigerC10 hey man, sorry to bug you with this again...
What CNI plugin are you using with genie? I tried first calico, but I hit myself with:
"CNI Genie Add IP internal error: Error from cni: no podCidr for node node-private-dns" during ContainerCreating
I've looked into passing this pod cidr during creation time (I'm automating the whole thing) but haven't found out a clue yet.
I'll try weave next, but I'd like to know which one worked out for you as well.

samaddico · 2019-03-07T15:41:45Z

Anybody had success with this?

prateekmadhikar · 2019-05-15T15:08:40Z

@lgg42 hey any luck on your side in doing that?

samaddico · 2019-05-15T15:57:21Z

See my blog https://medium.com/codeops/installing-weave-cni-on-aws-eks-51c2e6b7abc8

…

On Wed, 15 May 2019 at 16:08, Prateek Madhikar ***@***.***> wrote: @lgg42 <https://github.com/lgg42> hey any luck on your side doing that? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#176?email_source=notifications&email_token=ACK2CUC2IVXP3SYOWIA3ETLPVQRP7A5CNFSM4FURJT72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVO7AMY#issuecomment-492695603>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACK2CUEXLOCYR5WYFOAWO33PVQRP7ANCNFSM4FURJT7Q> .

prateekmadhikar · 2019-05-15T16:31:24Z

@samaddico Thank you I just followed it and it works like a charm!!

Pluies · 2019-05-15T16:52:21Z

I've had success using Calico CNI as well, without needing to use CNI Genie – with a caveat.

I used the following steps:

Delete the aws-node DaemonSet
Install Calico (DaemonSet + all other related resources) as per https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico using a CIDR of 192.168.0.0/16
Delete workloads so that they get recreated and pick up an IP from the new 192.168.0.0/16 IP range rather than a VPC IP

When following these steps, all workloads get IPs from the 192.168.0.0/16 range and can happily talk to other services over these IPs. Kubernetes Services use the new Pod IPs as endpoints, and inter-service communication using in-cluster DNS is fine as well (e.g. curl http://some-service.ns.svc from another Pod on another Node resolves correctly and is available).

However... We've ran into an issue when trying to install Istio on top of this cluster, where we now get an error message when trying to apply configurations. I've managed to reduce it to a problem with calling validating admission webhooks, which can be replicated using the example from BanzaiCloud at https://banzaicloud.com/blog/k8s-admission-webhooks/ : after creating the Deployment, Service and ValidatingWebhookConfiguration, trying to create a new Deployment that would trigger the webhook results in the following error:

[16:02:53]~/ $ kubectl apply -f deployment/sleep.yaml
Error from server (InternalError): error when creating "deployment/sleep.yaml": Internal error occurred: failed calling admission webhook "required-labels.banzaicloud.com": Post https://admission-webhook-example-svc.hook-test.svc:443/validate?timeout=30s: Address is not allowed

This error comes from the APIServer, according to the source, and is related to the new IP range not being nicely handled by the EKS masters.

My working theory, which has since been confirmed by AWS support:

Calico installs its CNI on the worker nodes, updates iptables, etc: new IP range now works between worker nodes
EKS masters, not being accessible by a normal DaemonSet, cannot be updated to understand the new IP range, and consider these addresses out of cluster, resulting in the "Address is not allowed" error

So if your use-case is basic, things should be fine, but if you plan to use Admission Webhooks, Calico CNI in EKS currently will not work.

samaddico · 2019-05-15T16:55:08Z

Awesome!! I will try it out too.

On Wed, 15 May 2019 at 16:52, Florent Delannoy ***@***.***> wrote: I've had success using Calico CNI as well, without needing to use CNI Genie – with a caveat. I used the following steps: - Delete the aws-node DaemonSet - Install Calico (DaemonSet + all other related resources) as per https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico using a CIDR of 192.168.0.0/16 - Delete workloads so that they get recreated and pick up an IP from the new 192.168.0.0/16 IP range rather than a VPC IP When following these steps, all workloads get IPs from the 192.168.0.0/16 range and can happily talk to other services over these IPs. Kubernetes Services use the new Pod IPs as endpoints, and inter-service communication using in-cluster DNS is fine as well (e.g. curl http://some-service.ns.svc from another Pod on another Node resolves correctly and is available). However... We've ran into an issue when trying to install Istio on top of this cluster, where we now get an error message when trying to apply configurations. I've managed to reduce it to a problem with calling validating admission webhooks, which can be replicated using the example from BanzaiCloud at https://banzaicloud.com/blog/k8s-admission-webhooks/ : after creating the Deployment, Service and ValidatingWebhookConfiguration, trying to create a new Deployment that would trigger the webhook results in the following error: [16:02:53]~/ $ kubectl apply -f deployment/sleep.yaml Error from server (InternalError): error when creating "deployment/sleep.yaml": Internal error occurred: failed calling admission webhook "required-labels.banzaicloud.com": Post https://admission-webhook-example-svc.hook-test.svc:443/validate?timeout=30s: Address is not allowed This error comes from the APIServer, according to the source <https://github.com/kubernetes/kubernetes/blob/v1.12.7/staging/src/k8s.io/apiserver/pkg/util/webhook/error.go>, and is related to the new IP range not being nicely handled by the EKS masters. My working theory, which has since been confirmed by AWS support: - Calico installs its CNI on the worker nodes, updates iptables, etc: new IP range now works between worker nodes - EKS masters, not being accessible by a normal DaemonSet, cannot be updated to understand the new IP range, and consider these addresses out of cluster, resulting in the "Address is not allowed" error So if your use-case is basic, things should be fine, but if you plan to use Admission Webhooks, Calico CNI in EKS currently will not work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#176?email_source=notifications&email_token=ACK2CUFW45XKD6PQYBOEDHLPVQ5UXA5CNFSM4FURJT72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVPIVUQ#issuecomment-492735186>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACK2CUCGRPSXKPNS3OSUHXTPVQ5UXANCNFSM4FURJT7Q> .

-- <https://about.me/samaddico?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=gmail_api> Samuel Addico about.me/samaddico <https://about.me/samaddico?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=gmail_api>

universam1 · 2019-06-20T13:38:48Z

This error comes from the APIServer, according to the source, and is related to the new IP range not being nicely handled by the EKS masters.

Thanks for the elaboration @Pluies - I came across the most probably same issue while debugging why kubectl proxy errors with EKS & Calico CNI with Error:

'Address is not allowed'
Trying to reach: 'http://192.168.x.y:9090/'

I believe this is due to the same issue that EKS is not accepting that traffic as internal, can you confirm the proxy issue?
Is there any AWS feedback how to circumvent this?

jaydipdave · 2019-08-13T21:03:17Z

I've had success using Calico CNI as well, without needing to use CNI Genie – with a caveat.

I used the following steps:

Delete the aws-node DaemonSet

Install Calico (DaemonSet + all other related resources) as per https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico using a CIDR of 192.168.0.0/16

Delete workloads so that they get recreated and pick up an IP from the new 192.168.0.0/16 IP range rather than a VPC IP

When following these steps, all workloads get IPs from the 192.168.0.0/16 range and can happily talk to other services over these IPs. Kubernetes Services use the new Pod IPs as endpoints, and inter-service communication using in-cluster DNS is fine as well (e.g. curl http://some-service.ns.svc from another Pod on another Node resolves correctly and is available).

However... We've ran into an issue when trying to install Istio on top of this cluster, where we now get an error message when trying to apply configurations. I've managed to reduce it to a problem with calling validating admission webhooks, which can be replicated using the example from BanzaiCloud at https://banzaicloud.com/blog/k8s-admission-webhooks/ : after creating the Deployment, Service and ValidatingWebhookConfiguration, trying to create a new Deployment that would trigger the webhook results in the following error:
[16:02:53]~/ $ kubectl apply -f deployment/sleep.yaml
Error from server (InternalError): error when creating "deployment/sleep.yaml": Internal error occurred: failed calling admission webhook "required-labels.banzaicloud.com": Post https://admission-webhook-example-svc.hook-test.svc:443/validate?timeout=30s: Address is not allowed
This error comes from the APIServer, according to the source, and is related to the new IP range not being nicely handled by the EKS masters.

My working theory, which has since been confirmed by AWS support:

Calico installs its CNI on the worker nodes, updates iptables, etc: new IP range now works between worker nodes

EKS masters, not being accessible by a normal DaemonSet, cannot be updated to understand the new IP range, and consider these addresses out of cluster, resulting in the "Address is not allowed" error

So if your use-case is basic, things should be fine, but if you plan to use Admission Webhooks, Calico CNI in EKS currently will not work.

stuck with the same thing here. "Address not allowed". Is there any hack available out there?

taylorludwig · 2019-08-13T23:29:17Z

I've managed to get admission controllers working by installing cni-genie, calico, and keeping aws-cni.

I then set any admission controller pods to use the aws cni while everything else defaults to calico.

It did require a couple tweaks to get everything to play nicely together.

Set the AWS_VPC_K8S_CNI_EXTERNALSNAT option to true on the aws-cni. Without it aws-cni changes some host routes that break calico pods from communicating with each-other.
This does require nodes be in a private subnet (which wasn't a problem for me). There is another option AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS that will be available in the next release of aws-cni - setting that to the cidr range of calico should result in the same fix and not require all nodes to be in private subnets.
Set IP_AUTODETECTION_METHOD on calico to interface=eth0. Without that I noticed that calico would use the wrong host ip when setting up the host to host networking since aws-cni adds multiple ips to nodes.

Only side affect I've noticed is that pods running on the aws cni couldn't directly communicate with pods on the calico network. calico to aws pods would work just fine as it would go out to the VPC as needed. I felt this was ideal anyways to keep the network separate - but it did require me to move the coredns pods to the aws cni so that all pods could get dns resolving.

Pluies · 2019-08-14T10:29:25Z

That's great news, thanks @taylorludwig !
Can you confirm whether kubectl exec and kubectl port-forward work on the calico pods using this method? (I assume they won't, based on info in this thread, but it'd be good to have confirmation)

mogren · 2019-08-14T18:36:04Z

@taylorludwig Thanks a lot for that very helpful comment!

taylorludwig · 2019-08-15T03:58:21Z

@Pluies kubectl exec kubectl port-forward kubectl logs all worked for me even with aws-cni fully removed. kubectl proxy was the only one I noticed that wouldn't work.

With the genie, calico, aws cni combo kubectl proxy will work if the node/service you are trying to hit is running on the aws cni.

techcto · 2019-08-17T14:04:40Z

@taylorludwig are you using the eks quickstart as your base? I am in process of trying to get weave installed. I have tried with and without aws-cni. I have CNI Genie installed also. What is the process you use to assign all of the base pods/services the annotation to use the aws-cni CNI.

techcto · 2019-08-17T14:14:42Z

@Pluies if it counts for anything, I can confirm same behavior as @taylorludwig with weave. Everything seems to work except proxy, it seems like proxy is getting IP address from weave and not from the instance. Trying to find a way to tell weave what interface is real or find way to have both CNI's live in harmony via CNI choice.

This was referenced Dec 11, 2018

Opt-Out AWS VPC CNI (And Any Other EKS "Magics") awslabs/amazon-eks-ami#117

Closed

[EKS] [CNI]: Optional Default CNI Plugin Installation aws/containers-roadmap#71

Closed

tabern added the question label Mar 5, 2019

tabern closed this as completed Mar 5, 2019

shinebayar-g mentioned this issue Dec 11, 2020

Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io" kubernetes/ingress-nginx#5401

Closed

bryantbiggs mentioned this issue Mar 1, 2022

[EKS] [request]: Cluster init job to bootstrap new clusters aws/containers-roadmap#1666

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are consequences of disabling the CNI plugin in EKS? #176

What are consequences of disabling the CNI plugin in EKS? #176

iamakulov commented Sep 11, 2018

TigerC10 commented Sep 26, 2018

iamakulov commented Sep 26, 2018 •

edited

Loading

TigerC10 commented Sep 26, 2018 •

edited

Loading

seeruk commented Oct 26, 2018 •

edited

Loading

TigerC10 commented Oct 26, 2018

iamakulov commented Nov 1, 2018 •

edited

Loading

VladimirSmogitel commented Dec 5, 2018

angelbarrera92 commented Dec 11, 2018

lgg42 commented Feb 19, 2019

TigerC10 commented Feb 19, 2019

lgg42 commented Feb 19, 2019 •

edited

Loading

TigerC10 commented Feb 19, 2019

lgg42 commented Feb 19, 2019

tabern commented Mar 5, 2019

lgg42 commented Mar 6, 2019

samaddico commented Mar 7, 2019

prateekmadhikar commented May 15, 2019 •

edited

Loading

samaddico commented May 15, 2019 via email

prateekmadhikar commented May 15, 2019

Pluies commented May 15, 2019

samaddico commented May 15, 2019 via email

universam1 commented Jun 20, 2019

jaydipdave commented Aug 13, 2019

taylorludwig commented Aug 13, 2019

Pluies commented Aug 14, 2019

mogren commented Aug 14, 2019

taylorludwig commented Aug 15, 2019 •

edited

Loading

techcto commented Aug 17, 2019

techcto commented Aug 17, 2019 •

edited

Loading

What are consequences of disabling the CNI plugin in EKS? #176

What are consequences of disabling the CNI plugin in EKS? #176

Comments

iamakulov commented Sep 11, 2018

TigerC10 commented Sep 26, 2018

iamakulov commented Sep 26, 2018 • edited Loading

TigerC10 commented Sep 26, 2018 • edited Loading

seeruk commented Oct 26, 2018 • edited Loading

TigerC10 commented Oct 26, 2018

iamakulov commented Nov 1, 2018 • edited Loading

VladimirSmogitel commented Dec 5, 2018

angelbarrera92 commented Dec 11, 2018

lgg42 commented Feb 19, 2019

TigerC10 commented Feb 19, 2019

lgg42 commented Feb 19, 2019 • edited Loading

TigerC10 commented Feb 19, 2019

lgg42 commented Feb 19, 2019

tabern commented Mar 5, 2019

lgg42 commented Mar 6, 2019

samaddico commented Mar 7, 2019

prateekmadhikar commented May 15, 2019 • edited Loading

samaddico commented May 15, 2019 via email

prateekmadhikar commented May 15, 2019

Pluies commented May 15, 2019

samaddico commented May 15, 2019 via email

universam1 commented Jun 20, 2019

jaydipdave commented Aug 13, 2019

taylorludwig commented Aug 13, 2019

Pluies commented Aug 14, 2019

mogren commented Aug 14, 2019

taylorludwig commented Aug 15, 2019 • edited Loading

techcto commented Aug 17, 2019

techcto commented Aug 17, 2019 • edited Loading

iamakulov commented Sep 26, 2018 •

edited

Loading

TigerC10 commented Sep 26, 2018 •

edited

Loading

seeruk commented Oct 26, 2018 •

edited

Loading

iamakulov commented Nov 1, 2018 •

edited

Loading

lgg42 commented Feb 19, 2019 •

edited

Loading

prateekmadhikar commented May 15, 2019 •

edited

Loading

taylorludwig commented Aug 15, 2019 •

edited

Loading

techcto commented Aug 17, 2019 •

edited

Loading