-
Notifications
You must be signed in to change notification settings - Fork 25
THREESCALE-10187: Detect container CPU quota properly #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| (quota_int.to_f / period_int).ceil | ||
| rescue | ||
| # Silent failure - fall back to other detection methods | ||
| nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logging an error would be good here, because I don't see anything to be expected to raise in the method. So while returning nil makes sense, good to know if something is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: ad22ad3
lib/3scale/backend/util.rb
Outdated
| quota = File.read(quota_path).strip.to_i | ||
| period = File.read(period_path).strip.to_i | ||
|
|
||
| return nil if quota == -1 # unlimited quota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line seems redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: c26d2b0
| (quota.to_f / period).ceil | ||
| rescue | ||
| # Silent failure - fall back to Etc.nprocessors | ||
| nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better to log errors here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
akostadinov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, is AI writing ugly code!
Anyway, looks good except for a couple of really ugly pieces of s^Mcode
The AI will take over the world, and you will regret this comment. Please respect our masters |
lib/3scale/backend/util.rb
Outdated
| rescue | ||
| # Silent failure - fall back to other detection methods | ||
| rescue StandardError => e | ||
| Backend.logger.info "Getting CPU quota from cgroups v2 failed, falling back to cgroups v1: #{e.message}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are here, this is an error, at least this is warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with the other place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 018594d
|
Cool, looks good! |
Co-authored-by: Aleksandar N. Kostadinov <[email protected]> Co-Authored-By: Claude <[email protected]>
Co-Authored-By: Claude <[email protected]>
018594d to
b457fa9
Compare
We use the number of available cpus to determine the amount of listener workers when neither
LISTENER_WORKERSandPUMA_WORKERSis set.In order to detect the number of cpus, we just read
Etc.nprocessors, which in linux equals to the number of cores in the real hardware. However in kubernetes, the amount of cpus assigned to the container is determined by cgroups.Since we were using
Etc.nprocessors, we were ignoring therequests.cpuandlimits.cpucontainer parameters.This PR implements this flow to get the amount of available cpus:
Cgroups v2 || Cgroups v1 || Etc.nprocessorsFor cgroups v2, we compute the value from
cpu.max; for cgroups v1, we compute it fromcpu.cfs_quota_usandcpu.cfs_period_us. This corresponds to thelimits.cpuvalue in kubernetes.I observed that in porta we are instead using the value from kubernetes
requests.cpu, which maps tocpu.weightin cgroups v2 andcpu.sharesin cgroups v1. I think that's essentially incorrect because weight/shares is a proportional value the kernel uses to set priorities between running containers in a scenario of high load, that is, it's a relative scheduling priority value.Such a priority value cannot be directly translated into number of cpus, because the final ammount of available cpu time will depend on how many actual cores the cluster have and how many other containers are there and which
weightthey have, a value that can change while the container runs. Setting number of workers from weight means setting a static value from a relative value.For cgroups1, kubernetes set the arbitrary amount of 1024 shares = 1cpu, and only allowed setting this priority via
requests.cpuincpu cores. In my opinion, usingcpu coresas a unit of relative priority is pretty confusing.Still today cpu cores seems to be the only way to indicate priority in Kubernetes, even after cgroups v2 was released, where that translation from 1024 shares to 1 core doesn't make sense anymore. See this GH issue.
In cgropups v1, shares go from 2 to 262142. and a "core" is 1024, so 512 times higher than the lowest value and 256 times lower than the highest value. However, in cgroups v2, weights go from 1 to 10000, being 100 the default, so 100 times higher than the lowest value and 100 times lower than the highest value.
In fact, they are about to announce a new formula to converts v1 values to v2: kubernetes/website#52793. So the formula we use in porta is incorrect.
What we really want is "max amount of available cores", and in Kubernetes, that's
limits.cpu. That is, the equivalent of whatEtc.nprocessorswould return in a real machine.Issue
https://issues.redhat.com/browse/THREESCALE-10187
Notes
Claude wrote the code and the tests.