cap cpu shares to the max allowed value on Linux #4204
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
We've lately had reach-outs from customers using very large instance types experiencing runtime errors. This u-9tb1.112xlarge instance type, for example, has 448 vCPUs.
Error (truncated in the ECS describe-task API response)
Customers defining more than 256 vCPUs as their container CPU in an ECS task definition, will see an error during the cgroup creation.
The Linux kernel defines a min and max value for the CPU share: ref (2^1 - 2^18). Currently we have a
minimumCPUShares
that is set to 2. This PR defines amaximumCPUShares
equal to 262144.If you give the kernel anything less or more than these min and max values, the kernel will anyway make sure the value is not out of range and automatically clamp it. Systemd has an additional check that prevents the cgroup creation. A similar patch has been merged into the k8s kubelet as well: kubernetes/kubernetes#93248.
In an ECS task definition, the
container.CPU
is a CPU reservation value (not a hard limit). For example, a container with 1 vCPU defined, on theu-9tb1.112xlarge
instance, is free to take up the entire 448 vCPUs. Whereas thetask.CPU
is a hard-limit, the task will be throttled beyond the defined value.We use CPU shares in 2 places:
container.CPU
is definedtask.CPU
is not definedThese are explained below in detail. This PR makes changes to both the above places, to account for
maximumCPUShares
.Implementation details
container.CPU
is defined, cap it before sending it to Docker.This handles the case where a task definition has container cpu defined. If the container cpu share is more than the max, we make it be the max. This is also consistent with how we bound the container cpu share to the min value today.
task.CPU
is not defined, cap it before creating the task cgroup.If an ECS task definition does not have
task.CPU
defined, the ECS agent sums up thecontainer.CPU
values and uses it as the task's CPU share. This is handled by thebuildImplicitLinuxCPUSpec
method - implicit because it was not defined explicitly. When customers explicitly definetask.CPU
, we do not create the task cgroup with a cpu share value; we use cpu quota and period (ref) to hard-limit the task.Testing
New tests cover the changes: yes
Added unit tests to test the new code. Unfortunately, I'm unable to find capacity for a large instance type (>256 vCPU), hence we can't test this on a real instance. The unit tests suffice for testing the new code added.
Description for the changelog
enhancement: cap cpu shares to the max allowed value on Linux.
Does this PR include breaking model changes? If so, Have you added transformation functions?
no
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.