-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] (cgroups 19/n) Allow fractions when getting the number of CPUs to calculate weights #57800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cpus available on the machine. This will prevent us from rounding down when running in a container that has cpu.max set. Signed-off-by: irabbani <[email protected]>
|
Tested on Anyscale w/ a 2 core machine. Works with default parameters now. From the logs
|
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment in followup
| """ | ||
| available_system_cpus = utils.get_num_cpus() | ||
| available_system_cpus = utils.get_num_cpus(truncate=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should leave a comment for why we don't truncate
) For more details about the resource isolation project see #54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from #57800. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: xgui <[email protected]>
) For more details about the resource isolation project see #54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from #57800. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
This PR stacks on #57776.
For more details about the resource isolation project see #54703.
When Ray calculates the number of cpus available on the machine, it checks to see if it's running in a container. However, it truncates the number of cpus.
In this PR,
DEFAULT_MIN_SYSTEM_RESERVED_CPU_CORES, then raise a ValueError. Previously, this was <DEFAULT_MIN_SYSTEM_RESERVED_CPU_CORES.ray._private.utils.get_num_cpusif an optional parameter is set to True.