Skip to content

Conversation

@israbbani
Copy link
Contributor

@israbbani israbbani commented Sep 11, 2025

This PR stacks on #56352 .

For more details about the resource isolation project see #54703.

This PR the following functions to move a process into the system cgroup:

  • CgroupManagerInterface::AddProcessToSystemCgroup
  • CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet will

  • be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server)
  • call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup.

israbbani and others added 30 commits July 24, 2025 20:39
to perform cgroup operations.

Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
instead of clone for older kernel headers < 5.7 (which is what we have
in CI)

Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Base automatically changed from irabbani/cgroups-9 to master September 16, 2025 22:17
/**
Moves the process into the system leaf cgroup (@see kLeafCgroupName).
To move the pid, the process must have read, write, and execute permissions for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a colon at the end of the line since it preceeds a list

@edoakes edoakes merged commit abd40b3 into master Sep 17, 2025
4 of 5 checks passed
@edoakes edoakes deleted the irabbani/cgroups-10 branch September 17, 2025 13:23
edoakes pushed a commit that referenced this pull request Sep 18, 2025
…6626)

Broken in #56446. 

This should stop being possible once there's a single cgroups target
exported (as highlighted in #54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
israbbani added a commit that referenced this pull request Sep 22, 2025
…oupDriver to move processes into system cgroup (#56446)"

This reverts commit abd40b3.
zma2 pushed a commit to zma2/ray that referenced this pull request Sep 23, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Zhiqiang Ma <[email protected]>
zma2 pushed a commit to zma2/ray that referenced this pull request Sep 23, 2025
…y-project#56626)

Broken in ray-project#56446.

This should stop being possible once there's a single cgroups target
exported (as highlighted in ray-project#54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: Zhiqiang Ma <[email protected]>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: zac <[email protected]>
ZacAttack pushed a commit to ZacAttack/ray that referenced this pull request Sep 24, 2025
…y-project#56626)

Broken in ray-project#56446. 

This should stop being possible once there's a single cgroups target
exported (as highlighted in ray-project#54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: zac <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
…r to move processes into system cgroup (#56446)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 24, 2025
…6626)

Broken in #56446. 

This should stop being possible once there's a single cgroups target
exported (as highlighted in #54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Marco Stephan <[email protected]>
marcostephan pushed a commit to marcostephan/ray that referenced this pull request Sep 24, 2025
…y-project#56626)

Broken in ray-project#56446.

This should stop being possible once there's a single cgroups target
exported (as highlighted in ray-project#54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: Marco Stephan <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
…r to move processes into system cgroup (#56446)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
elliot-barn pushed a commit that referenced this pull request Sep 27, 2025
…6626)

Broken in #56446. 

This should stop being possible once there's a single cgroups target
exported (as highlighted in #54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
…r to move processes into system cgroup (#56446)

This PR stacks on #56352 .

For more details about the resource isolation project see
#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…y-project#56626)

Broken in ray-project#56446.

This should stop being possible once there's a single cgroups target
exported (as highlighted in ray-project#54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…y-project#56626)

Broken in ray-project#56446. 

This should stop being possible once there's a single cgroups target
exported (as highlighted in ray-project#54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…r to move processes into system cgroup (ray-project#56446)

This PR stacks on ray-project#56352 .

For more details about the resource isolation project see
ray-project#54703.

This PR the following functions to move a process into the system
cgroup:
* CgroupManagerInterface::AddProcessToSystemCgroup
* CgroupDriverInterface::AddProcessToCgroup

I've also added integration tests for SysFsCgroupDriver and unit tests
for CgroupManager.

Let me explain how these APIs will be used. In the next PR, the raylet
will
* be passed a list of pids of system processes that are started before
the raylet starts and need to be moved into the system cgroup (e.g.
gcs_server)
* call CgroupManagerInterface::AddProcessToSystemCgroup for each of
these pids to move them into the system cgroup.

---------

Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…y-project#56626)

Broken in ray-project#56446. 

This should stop being possible once there's a single cgroups target
exported (as highlighted in ray-project#54703).

I've fixed the broken build and I've added a temporary test target that
builds the noop implementations as part of Linux CI so it gets caught in
premerge.

---------

Signed-off-by: irabbani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants