THREESCALE-10187: Detect container CPU quota properly #452

jlledom · 2025-11-20T14:14:47Z

We use the number of available cpus to determine the amount of listener workers when neither LISTENER_WORKERS and PUMA_WORKERS is set.

In order to detect the number of cpus, we just read Etc.nprocessors, which in linux equals to the number of cores in the real hardware. However in kubernetes, the amount of cpus assigned to the container is determined by cgroups.

Since we were using Etc.nprocessors, we were ignoring the requests.cpu and limits.cpu container parameters.

This PR implements this flow to get the amount of available cpus:

Cgroups v2 || Cgroups v1 || Etc.nprocessors

For cgroups v2, we compute the value from cpu.max; for cgroups v1, we compute it from cpu.cfs_quota_us and cpu.cfs_period_us. This corresponds to the limits.cpu value in kubernetes.

I observed that in porta we are instead using the value from kubernetes requests.cpu, which maps to cpu.weight in cgroups v2 and cpu.shares in cgroups v1. I think that's essentially incorrect because weight/shares is a proportional value the kernel uses to set priorities between running containers in a scenario of high load, that is, it's a relative scheduling priority value.

Such a priority value cannot be directly translated into number of cpus, because the final ammount of available cpu time will depend on how many actual cores the cluster have and how many other containers are there and which weight they have, a value that can change while the container runs. Setting number of workers from weight means setting a static value from a relative value.

For cgroups1, kubernetes set the arbitrary amount of 1024 shares = 1cpu, and only allowed setting this priority via requests.cpu in cpu cores. In my opinion, using cpu cores as a unit of relative priority is pretty confusing.

Still today cpu cores seems to be the only way to indicate priority in Kubernetes, even after cgroups v2 was released, where that translation from 1024 shares to 1 core doesn't make sense anymore. See this GH issue.

In cgropups v1, shares go from 2 to 262142. and a "core" is 1024, so 512 times higher than the lowest value and 256 times lower than the highest value. However, in cgroups v2, weights go from 1 to 10000, being 100 the default, so 100 times higher than the lowest value and 100 times lower than the highest value.

In fact, they are about to announce a new formula to converts v1 values to v2: kubernetes/website#52793. So the formula we use in porta is incorrect.

What we really want is "max amount of available cores", and in Kubernetes, that's limits.cpu. That is, the equivalent of what Etc.nprocessors would return in a real machine.

Issue

https://issues.redhat.com/browse/THREESCALE-10187

Notes

Claude wrote the code and the tests.

lib/3scale/backend/util.rb

akostadinov · 2025-11-24T09:11:43Z

lib/3scale/backend/util.rb

+        (quota_int.to_f / period_int).ceil
+      rescue
+        # Silent failure - fall back to other detection methods
+        nil


I think logging an error would be good here, because I don't see anything to be expected to raise in the method. So while returning nil makes sense, good to know if something is broken.

Done: ad22ad3

akostadinov · 2025-11-24T09:12:18Z

lib/3scale/backend/util.rb

+        quota = File.read(quota_path).strip.to_i
+        period = File.read(period_path).strip.to_i
+
+        return nil if quota == -1 # unlimited quota


this line seems redundant

Done: c26d2b0

akostadinov · 2025-11-24T09:12:39Z

lib/3scale/backend/util.rb

+        (quota.to_f / period).ceil
+      rescue
+        # Silent failure - fall back to Etc.nprocessors
+        nil


would be better to log errors here as well

akostadinov

Man, is AI writing ugly code!

Anyway, looks good except for a couple of really ugly pieces of s^Mcode

jlledom · 2025-11-24T12:43:10Z

Man, is AI writing ugly code!

Anyway, looks good except for a couple of really ugly pieces of s^Mcode

The AI will take over the world, and you will regret this comment. Please respect our masters

akostadinov · 2025-11-24T13:02:05Z

lib/3scale/backend/util.rb

-      rescue
-        # Silent failure - fall back to other detection methods
+      rescue StandardError => e
+        Backend.logger.info "Getting CPU quota from cgroups v2 failed, falling back to cgroups v1: #{e.message}"


if we are here, this is an error, at least this is warn

same with the other place

Done: 018594d

akostadinov · 2025-11-24T13:13:06Z

Cool, looks good!

Co-authored-by: Aleksandar N. Kostadinov <[email protected]> Co-Authored-By: Claude <[email protected]>

Co-Authored-By: Claude <[email protected]>

jlledom requested review from akostadinov and mayorova November 20, 2025 14:14

jlledom self-assigned this Nov 20, 2025

akostadinov reviewed Nov 24, 2025

View reviewed changes

lib/3scale/backend/util.rb Outdated Show resolved Hide resolved

akostadinov reviewed Nov 24, 2025

View reviewed changes

akostadinov approved these changes Nov 24, 2025

View reviewed changes

akostadinov reviewed Nov 24, 2025

View reviewed changes

jlledom and others added 2 commits November 26, 2025 08:47

Compute number of CPUs from kubernetes limit conf

6d6eb69

Co-authored-by: Aleksandar N. Kostadinov <[email protected]> Co-Authored-By: Claude <[email protected]>

Add test suite for cpu detection

b457fa9

Co-Authored-By: Claude <[email protected]>

jlledom force-pushed the THREESCALE-10187-detect-cpus branch from 018594d to b457fa9 Compare November 26, 2025 07:47

jlledom merged commit fdbe781 into 3scale:master Nov 26, 2025
12 checks passed

THREESCALE-10187: Detect container CPU quota properly #452

THREESCALE-10187: Detect container CPU quota properly #452

Uh oh!

Conversation

jlledom commented Nov 20, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akostadinov left a comment

Choose a reason for hiding this comment

Uh oh!

jlledom commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akostadinov commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants