-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are consequences of disabling the CNI plugin in EKS? #176
Comments
@rorychatt wrote a great article explaining what benefits Amazon VPC CNI provides: https://www.contino.io/insights/kubernetes-is-hard-why-eks-makes-it-easier-for-network-and-security-architects The shortlist is, without AWS VPC CNI Plugin:
If at all possible, upgrade your worker node instance type to increase your capacity. Arguably the best option, since you want to run more containers per worker node anyway. But if not possible, suggest you use CNI-Genie plugin to overcome your issue. This plugin allows you to annotate the pods with your CNI of choice, effectively letting you pick and choose which ones will allocate an IP in your VPC and which ones will allocate through some other overlay network. |
Thanks a lot!
The issue for us is pricing. We run a large number of similar resource-light apps (thing e.g. app hosting), and we want to keep each pod within a specific budget. Larger nodes allow hosting more pods but are also more expensive, so the price-per-pod stays the same (too high). In fact, during further experiments, it turned out that disabling the CNI plugin works perfectly for a single node, but makes Kubernetes behave weirdly when there’re multiple nodes:
This is weird because it looks like Kubernetes should enable the default network plugin which should ensure IPs are unique per cluster (I might be wrong though, I’m not experienced with Kubernetes):
Have you ever heard of this? Might a different non-default CNI plugin (probably even picked with CNI-Genie) solve this? |
I know that kubenet only works well for single node clusters, it contains no cross-node capabilities due to its exclusive use of host-local.
And yeah, if you want to install flannel, Calico, or Weave then you'll create an overlay network instead. Nothing wrong with that, you just lose the ease of use that AWS VPC CNI provides. EDIT: |
Will there be a way of automating opting out of using the AWS VPC CNI plugin? I've been setting up and EKS cluster and was surprised when I hit the IP limit, but you can hit it so quickly. I have 2 applications running in my cluster, just 2, after installing the daemon sets for other bits and bats that are cluster-wide (Istio, Cert Manager, Fluentd for Elasticsearch, etc.). This IP limit is quite restrictive on smaller machines, and I'd rather just be able to avoid it and focus on CPU / memory limits being the thing I have to worry about in general. Edit: I think a big part of my issue may have been to do with using t3 instances there, t2 ones seem to be working better, and I have been able to schedule more pods - I think there was an issue there where K8s thought that the nodes had more pod availability but the CNI plugin was unable to assign an address for some reason. |
@seeruk the cloud formation template (or terraform, or kops, or whatever) you used to create your EKS cluster specifies the worker node AMI to use on nodes in the cluster. If you make your own AMI, you can install whatever CNI you want to "opt-out" and create an overlay network instead. |
To update on my situation: in the end, I switched from EKS to a cluster created with kops. I tried to make EKS work with kube-router (as a simple solution), and it worked, but kube-router depends on having Also, using kops instead of EKS allowed us to make our staging cluster cheaper (by using a single kops has built-in support for kube-router, flannel, Calico, and a bunch of other networking solutions. |
@iamakulov |
Hi @VladimirSmogitel I have deployed multiple kubernetes cluster on aws using kops and weave as CNI Thanks |
@TigerC10 have you installed the CNI-Genie in EKS? it looks like a very good solution to this problem, even a nice feature to have!, but after seeing in the deployment yaml that genie pods will be running in master nodes.. I have my doubts.... |
Yes, I eventually got it working by using cni-genie and installing/configuring it to the master node before creating the ec2 worker nodes. It's not the best solution (not as easily automatable), but it's functional. |
OK, my mind just blew, I thought EKS's master nodes were a black box, |
Yeah, even when there are no nodes registered to the master node cluster - you can still run If the worker nodes join the cluster before the master node has the additional CNI plugins, then you will have to delete the worker nodes and add them back in to recognize the alternate CNI plugins on the worker nodes. |
Man, this is GOLD! Thank you so much! I'm definitely gonna give it a try now! |
Closing - question answered |
@TigerC10 hey man, sorry to bug you with this again... |
Anybody had success with this? |
@lgg42 hey any luck on your side in doing that? |
… On Wed, 15 May 2019 at 16:08, Prateek Madhikar ***@***.***> wrote:
@lgg42 <https://github.com/lgg42> hey any luck on your side doing that?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#176?email_source=notifications&email_token=ACK2CUC2IVXP3SYOWIA3ETLPVQRP7A5CNFSM4FURJT72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVO7AMY#issuecomment-492695603>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACK2CUEXLOCYR5WYFOAWO33PVQRP7ANCNFSM4FURJT7Q>
.
|
@samaddico Thank you I just followed it and it works like a charm!! |
I've had success using Calico CNI as well, without needing to use CNI Genie – with a caveat. I used the following steps:
When following these steps, all workloads get IPs from the 192.168.0.0/16 range and can happily talk to other services over these IPs. Kubernetes Services use the new Pod IPs as endpoints, and inter-service communication using in-cluster DNS is fine as well (e.g. However... We've ran into an issue when trying to install Istio on top of this cluster, where we now get an error message when trying to apply configurations. I've managed to reduce it to a problem with calling validating admission webhooks, which can be replicated using the example from BanzaiCloud at https://banzaicloud.com/blog/k8s-admission-webhooks/ : after creating the Deployment, Service and ValidatingWebhookConfiguration, trying to create a new Deployment that would trigger the webhook results in the following error:
This error comes from the APIServer, according to the source, and is related to the new IP range not being nicely handled by the EKS masters. My working theory, which has since been confirmed by AWS support:
So if your use-case is basic, things should be fine, but if you plan to use Admission Webhooks, Calico CNI in EKS currently will not work. |
Awesome!! I will try it out too.
On Wed, 15 May 2019 at 16:52, Florent Delannoy ***@***.***> wrote:
I've had success using Calico CNI as well, without needing to use CNI
Genie – with a caveat.
I used the following steps:
- Delete the aws-node DaemonSet
- Install Calico (DaemonSet + all other related resources) as per
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico
using a CIDR of 192.168.0.0/16
- Delete workloads so that they get recreated and pick up an IP from
the new 192.168.0.0/16 IP range rather than a VPC IP
When following these steps, all workloads get IPs from the 192.168.0.0/16
range and can happily talk to other services over these IPs. Kubernetes
Services use the new Pod IPs as endpoints, and inter-service communication
using in-cluster DNS is fine as well (e.g. curl http://some-service.ns.svc
from another Pod on another Node resolves correctly and is available).
However... We've ran into an issue when trying to install Istio on top of
this cluster, where we now get an error message when trying to apply
configurations. I've managed to reduce it to a problem with calling
validating admission webhooks, which can be replicated using the example
from BanzaiCloud at https://banzaicloud.com/blog/k8s-admission-webhooks/
: after creating the Deployment, Service and
ValidatingWebhookConfiguration, trying to create a new Deployment that
would trigger the webhook results in the following error:
[16:02:53]~/ $ kubectl apply -f deployment/sleep.yaml
Error from server (InternalError): error when creating "deployment/sleep.yaml": Internal error occurred: failed calling admission webhook "required-labels.banzaicloud.com": Post https://admission-webhook-example-svc.hook-test.svc:443/validate?timeout=30s: Address is not allowed
This error comes from the APIServer, according to the source
<https://github.com/kubernetes/kubernetes/blob/v1.12.7/staging/src/k8s.io/apiserver/pkg/util/webhook/error.go>,
and is related to the new IP range not being nicely handled by the EKS
masters.
My working theory, which has since been confirmed by AWS support:
- Calico installs its CNI on the worker nodes, updates iptables, etc:
new IP range now works between worker nodes
- EKS masters, not being accessible by a normal DaemonSet, cannot be
updated to understand the new IP range, and consider these addresses out of
cluster, resulting in the "Address is not allowed" error
So if your use-case is basic, things should be fine, but if you plan to
use Admission Webhooks, Calico CNI in EKS currently will not work.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#176?email_source=notifications&email_token=ACK2CUFW45XKD6PQYBOEDHLPVQ5UXA5CNFSM4FURJT72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVPIVUQ#issuecomment-492735186>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACK2CUCGRPSXKPNS3OSUHXTPVQ5UXANCNFSM4FURJT7Q>
.
|
Thanks for the elaboration @Pluies - I came across the most probably same issue while debugging why
I believe this is due to the same issue that EKS is not accepting that traffic as internal, can you confirm the proxy issue? |
stuck with the same thing here. "Address not allowed". Is there any hack available out there? |
I've managed to get admission controllers working by installing cni-genie, calico, and keeping aws-cni. I then set any admission controller pods to use the It did require a couple tweaks to get everything to play nicely together.
Only side affect I've noticed is that pods running on the |
That's great news, thanks @taylorludwig ! |
@taylorludwig Thanks a lot for that very helpful comment! |
@Pluies With the genie, calico, aws cni combo |
@taylorludwig are you using the eks quickstart as your base? I am in process of trying to get weave installed. I have tried with and without aws-cni. I have CNI Genie installed also. What is the process you use to assign all of the base pods/services the annotation to use the aws-cni CNI. |
@Pluies if it counts for anything, I can confirm same behavior as @taylorludwig with weave. Everything seems to work except proxy, it seems like proxy is getting IP address from weave and not from the instance. Trying to find a way to tell weave what interface is real or find way to have both CNI's live in harmony via CNI choice. |
I need to overcome the secondary IPs limit that EC2 worker nodes have.
I tried disabling the CNI plugin on my worker node:
sudo sed -i '/--network-plugin=cni/d' /etc/systemd/system/kubelet.service sudo systemctl daemon-reload sudo systemctl restart kubelet.service
and it appears that everything is working fine (pods seem to work as intended, ClusterIP and LoadBalancer services successfully resolve to pods).
Are there any non-obvious bad consequences of disabling the CNI plugin in EKS?
The text was updated successfully, but these errors were encountered: