-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for Scheduler performance tuning
- Loading branch information
Showing
1 changed file
with
113 additions
and
0 deletions.
There are no files selected for viewing
113 changes: 113 additions & 0 deletions
113
content/en/docs/concepts/configuration/scheduler-perf-tuning.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
reviewers: | ||
- bsalamat | ||
title: Scheduler Performance Tuning | ||
content_template: templates/concept | ||
weight: 70 | ||
--- | ||
|
||
{{% capture overview %}} | ||
|
||
{{< feature-state for_k8s_version="1.12" >}} | ||
|
||
Kube-scheduler is the Kubernetes default scheduler. It is responsible for | ||
placement of Pods on proper Nodes of a cluster. Nodes of a cluster that meet the | ||
scheduling requirements of a Pod are called feasible Nodes for the Pod. The | ||
scheduler finds feasible Nodes for a Pod and then runs a set of functions to | ||
score the feasible Nodes and picks a Node with the highest score among the | ||
feasible ones to run the Pod. It then notifies the API server about this | ||
decision in a process called "Binding". | ||
|
||
{{% /capture %}} | ||
|
||
{{% capture body %}} | ||
|
||
## Percentage of Nodes to Score | ||
|
||
Before Kubernetes 1.12, Kube-scheduler used to check the feasibility of all the | ||
nodes in a cluster and then scored the feasible ones. Kubernetes 1.12 has a new | ||
feature that allows the scheduler to stop looking for more feasible nodes once | ||
it finds a certain number of them. This improves the scheduler's performance in | ||
large clusters. The number is specified as a percentage of the cluster size and | ||
is controlled by a configuration option called `percentageOfNodesToScore`. The | ||
range should be between 1 and 100. Other values are considered as 100%. The | ||
default value of this option is 50%. You can change this value by providing a | ||
different value in the scheduler configuration, but read further to find whether | ||
you should change the value. | ||
|
||
```yaml | ||
apiVersion: componentconfig/v1alpha1 | ||
kind: KubeSchedulerConfiguration | ||
algorithmSource: | ||
provider: DefaultProvider | ||
|
||
... | ||
|
||
percentageOfNodesToScore: 50 | ||
``` | ||
{{< note >}} **Note**: In clusters with zero or few feasible nodes, the | ||
scheduler still checks all the nodes, simply because there are not enough | ||
feasible nodes to stop the scheduler's search early. {{< /note >}} | ||
**To disable this feature**, you can set `percentageOfNodesToScore` to 100. | ||
|
||
### Tuning percentageOfNodesToScore | ||
|
||
As stated above, `percentageOfNodesToScore` must be a value between 1 and 100 | ||
with the default value of 50. There is also a hardcoded minimum value of 50 | ||
nodes which is applied internally. It means that the scheduler tries to find at | ||
least 50 nodes regardless of the value of `percentageOfNodesToScore`. This means | ||
that changing this option to lower values in clusters with several hundred nodes | ||
will not have much impact on the number of feasible nodes that the scheduler | ||
tries to find. This is intentional as this option is unlikely to improve | ||
performance noticeably in smaller clusters. In large clusters with over a 1000 | ||
nodes setting this value to lower numbers may show a noticeable performance | ||
improvement. | ||
|
||
An important note to consider when setting this value is that when a smaller | ||
number of nodes in a cluster are checked for feasibility, some nodes are not | ||
sent to be scored for a given Pod. As a result, a Node which could possibly | ||
score a higher value for running the given Pod might not even be passed to the | ||
scoring phase. This would result in a less than ideal placement of the Pod. For | ||
this reason, the value should not be set to very low percentages. A general rule | ||
of thumb is to never set the value to anything lower than 30. Lower values | ||
should be used only when the scheduler's throughput is critical for your | ||
application and the score of nodes is not important. In other words, you prefer | ||
to run the Pod on any Node as long as it is feasible. | ||
|
||
We do not recommend lowering this value from its default if your cluster has | ||
only several hundred Nodes. It is unlikely to improve the scheduler's | ||
performance significantly. | ||
|
||
### How scheduler iterates over Nodes | ||
|
||
This section is intended for those who want to understand the internal details | ||
of this feature. | ||
|
||
In order to give all the Nodes in a cluster a fair chance of being considered | ||
for running Pods, the scheduler iterates over the nodes in a round robin | ||
fashion. You can imagine that Nodes are in an array. The scheduler starts from | ||
the start of the array and checks feasibility of the nodes until it finds enough | ||
Nodes as specified by `percentageOfNodesToScore`. For the next Pod, the | ||
scheduler continues from the point in the array that it stopped when checking | ||
feasibility of the previous Pod. | ||
|
||
If Nodes are in multiple zones, the scheduler iterates over Nodes in various | ||
zones to ensure that Nodes from different zones are considered in the | ||
feasibility checks. As an example, consider six nodes in two zones: | ||
|
||
``` | ||
Zone 1: Node 1, Node 2, Node 3, Node 4 | ||
Zone 2: Node 5, Node 6 | ||
``` | ||
|
||
Scheduler evaluates feasibility of the nodes in this oder: | ||
|
||
``` | ||
Node 1, Node 5, Node 2, Node 6, Node 3, Node 4 | ||
``` | ||
|
||
After going over all the Nodes, it goes back to Node 1. | ||
|
||
{{% /capture %}} |