Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: increase the kube-api-server QPS from 5/10 to 10/20 #2436

Merged

Conversation

tzneal
Copy link
Contributor

@tzneal tzneal commented Sep 19, 2022

Issue number:

N/A

Description of changes:

This changes the default setting for EKS v1.22+ where API Priority & Fairness is available and there is a specific queue for kubelet health.

Testing done:

cargo make unit-tests and

  • Built a custom K8s v1.23 Bottlerocket AMI
  • Launched a node using that AMI
  • Verified that the config changes were applied by logging in via SSM
bash-5.1# cat /etc/kubernetes/kubelet/config  | grep kubeAPI
kubeAPIQPS: 10
kubeAPIBurst: 20

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

This changes the default setting for EKS v1.22+ where API Priority &
Fairness is available and there is a specific queue for kubelet health.
@tzneal
Copy link
Contributor Author

tzneal commented Sep 19, 2022

In testing increasing from 5 QPS to 10 QPS effectively halves the overall time for 100 pods to become ready on a single node from 67.5 to 32.8 seconds while also halving the time for the first pod to become ready from to 22 seconds to 10 second.
image
For larger scale-up events (in this case 3k pods and 30 nodes) the benefits above 10/20 QPS settings decrease, though a large benefit still remains if the values are increased from the default 5/10 QPS.
image

Any change in the kubelet QPS settings is scaled by the number of nodes in each cluster. This poses a risk that we can overwhelm the kube-api-server by increasing the amount of traffic received from kubelet running on each node. The primary mitigation to this risk is that for K8s v1.20+ the API Priority & Fairness feature is enabled in EKS. This added to Kubernetes the concept of priority levels where requests in different priority levels cannot starve each other. In v1.22+ specifically there is a separate queue for K8s health checks which will prevent increased load from pod churn from interfering with Kubelet health reporting. The increase here is small while realizing most of the benefit, which further decreases negative impact risks.

Copy link
Contributor

@etungsten etungsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the default setting for EKS v1.22+

These kubelet configs are also used in non-EKS K8s variants as well, e.g. metal-k8s-*, vmware-k8s-*.

I think increasing the defaults still makes sense. I also agree with the assessment that the benefits here outweighs any potential downsides.

Copy link
Contributor

@jpmcb jpmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! 🏃🏼

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate the focus on metrics and data here. Looks like a nice win.

@tzneal
Copy link
Contributor Author

tzneal commented Sep 21, 2022

I really appreciate the focus on metrics and data here. Looks like a nice win.

Thanks! I don't have permissions on this repo to merge, so feel free to merge whenever.

@arnaldo2792
Copy link
Contributor

I'm merging since all tests passed (thanks @tzneal!)

@arnaldo2792 arnaldo2792 merged commit 9cd0945 into bottlerocket-os:develop Sep 21, 2022
@tzneal tzneal deleted the increase-kubeket-api-server-qps branch March 1, 2023 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants