Skip to content

Conversation

@israbbani
Copy link
Contributor

@israbbani israbbani commented Sep 8, 2025

This PR stacks on #56297.

For more details about the resource isolation project see #54703.

This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include:

  1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for --except-tags cgroup everywhere else.
  2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like:
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //

where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in python/ray/tests/resource_isolation/test_resource_isolation_integration.py.
6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented.

israbbani and others added 30 commits July 24, 2025 20:39
to perform cgroup operations.

Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
instead of clone for older kernel headers < 5.7 (which is what we have
in CI)

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
@edoakes edoakes merged commit b25b7e2 into master Sep 16, 2025
4 of 5 checks passed
@edoakes edoakes deleted the irabbani/cgroups-9 branch September 16, 2025 22:17
@edoakes
Copy link
Collaborator

edoakes commented Sep 16, 2025

@israbbani merged to avoid excessive iteration time. Please follow up to use existing fixtures if possible and let's do something different than mark tests as manual

@israbbani
Copy link
Contributor Author

I can merge/promote fixtures to the main conftest and use them. I don't have a good idea for how to prevent these tests from running as part of normal unit tests without marking them as manual. I'll leave a TODO in #54703 and see if I can come up with anything better.

instance_type: medium
commands:
- RAYCI_DISABLE_TEST_DB=1 bazel run //ci/ray_ci:test_in_docker -- //:all //src/ray/common/cgroup2/tests/... core --build-type clang --cache-test-results
- bazel run //ci/ray_ci:test_in_docker -- //:all //python/ray/tests/resource_isolation:test_resource_isolation_integration //python/ray/tests/resource_isolation:test_resource_isolation_config core --privileged --cache-test-results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we remove the RAYCI_DISABLE_TEST_DB=1 accidentally?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that should get added back

edoakes added a commit that referenced this pull request Sep 17, 2025
…r to move processes into system cgroup (#56446)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
zma2 pushed a commit to zma2/ray that referenced this pull request Sep 23, 2025
ray-project#56352)

This PR stacks on ray-project#56297.

For more details about the resource isolation project see
ray-project#54703.

This PR wires the resource isolation config (introduced here ray-project#51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Zhiqiang Ma <[email protected]>
zma2 pushed a commit to zma2/ray that referenced this pull request Sep 23, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Zhiqiang Ma <[email protected]>
edoakes added a commit that referenced this pull request Sep 23, 2025
…n startup (#56522)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in #56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
ray-project#56352)

This PR stacks on ray-project#56297.

For more details about the resource isolation project see
ray-project#54703.

This PR wires the resource isolation config (introduced here ray-project#51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: zac <[email protected]>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: zac <[email protected]>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…n startup (ray-project#56522)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in ray-project#56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: zac <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
#56352)

This PR stacks on #56297.

For more details about the resource isolation project see
#54703.

This PR wires the resource isolation config (introduced here #51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
…r to move processes into system cgroup (#56446)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
…n startup (#56522)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in #56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
ray-project#56352)

This PR stacks on ray-project#56297.

For more details about the resource isolation project see
ray-project#54703.

This PR wires the resource isolation config (introduced here ray-project#51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Marco Stephan <[email protected]>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Marco Stephan <[email protected]>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
…n startup (ray-project#56522)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in ray-project#56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Marco Stephan <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
#56352)

This PR stacks on #56297.

For more details about the resource isolation project see
#54703.

This PR wires the resource isolation config (introduced here #51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
…r to move processes into system cgroup (#56446)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
…n startup (#56522)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in #56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
ray-project#56352)

This PR stacks on ray-project#56297.

For more details about the resource isolation project see
ray-project#54703.

This PR wires the resource isolation config (introduced here ray-project#51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…n startup (ray-project#56522)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in ray-project#56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
ray-project#56352)

This PR stacks on ray-project#56297.

For more details about the resource isolation project see
ray-project#54703.

This PR wires the resource isolation config (introduced here ray-project#51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…n startup (ray-project#56522)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in ray-project#56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
ray-project#56352)

This PR stacks on ray-project#56297.

For more details about the resource isolation project see
ray-project#54703.

This PR wires the resource isolation config (introduced here ray-project#51865)
from ray cli (ray start) and the ray sdk (ray.init) into the raylet.
Notable changes include:
1. A separate python test target for running test_resource_isolation
related unit and integration tests. This unifies all cgroup related
tests under one buildkite target and removes the need for `--except-tags
cgroup` everywhere else.
2. Modification to the cgroup hierarchy. This was an oversight on my
part. The "no internal processes" constraint says that a non-root cgroup
can either have controllers enabled or have processes. Therefore, the
new hierarchy looks like:
```
  //      base_cgroup_path (e.g. /sys/fs/cgroup)
  //             |
  //     ray_node_<node_id>
  //       |           |
  //     system     application
  //       |           |
  //      leaf        leaf
  //
```
where the leaf nodes contain all processes and the system/application
cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows
ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
`python/ray/tests/resource_isolation/test_resource_isolation_integration.py`.
6. Moved all previous integration tests from test_ray_init and test_cli
into test_resource_isolation_integration.py. These tests have TODOs to
finish them up once the rest of cgroup features are implemented.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…n startup (ray-project#56522)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the makes the raylet move the system processes into the system
cgroup on startup if resource isolation is enabled.

It introduces the following
* A new raylet cli arg `--system-pids` which is a comma-separated string
of pids of system processes that are started before the raylet. As of
today, it contains
* On the head node: gcs_server, dashboard_api_server, ray client server,
monitor (autoscaler)
  * On every node (including head): process subreaper, log monitor.
* End-to-end integration tests for resource isolation with the Ray SDK
(`ray.init`) and the Ray CLI (`ray --start`)

There are a few rough edges (I've added a comment on the PR where
relevant):
1. The construction of ResourceIsolationConfig is spread across multiple
call-sites (create the object, add the object store memory, add the
system pids). The big positive of doing it this way was to fail fast on
invalid user input (in scripts.py and worker.py). I think it needs to
have at least two components: the user input (cgroup_path,
system_reserved_memory, ...) and the derived input (system_pids,
total_system_reserved_memory).
2. How to determine which processes should be moved? Right now I'm using
`self.all_processes` in `node.py`. It _should_ contain all processes
started so far, but there's no guarantee.
3. How intrusive should the integration test be? Should we count the
number of pids inside the system cgroup? (This was answered in ray-project#56549)
4. How should a user setup multiple nodes on the same VM? I haven't
written an integration test for it yet because there are multiple
options for how to set this up.

---------

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants