Skip to content

Conversation

@israbbani
Copy link
Contributor

@israbbani israbbani commented Oct 21, 2025

For more details about the resource isolation project see #54703.

Driver processes that are registered in ray's internal namespace (such as ray dashboard's job and serve modules) are considered system processes. Therefore, they will not be moved into the workers cgroup when they register with the raylet.

cgroup even if they are drivers

Signed-off-by: irabbani <[email protected]>
@israbbani israbbani added core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests labels Oct 21, 2025
def get_pid(self):
return os.getpid()

second_driver_proc = create_driver_in_internal_namespace()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added the test to ray start because in ray.init, the testing process is a driver that is already in the workers cgroup. Therefore, the newly spawned second driver will also be in the workers cgroup.

@israbbani israbbani marked this pull request as ready for review October 22, 2025 16:29
@israbbani israbbani requested a review from a team as a code owner October 22, 2025 16:29
runner.invoke(scripts.stop)
assert_process_in_not_moved_into_ray_cgroups(
node_id, resource_isolation_config, second_driver_proc.pid
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test Assertion Fails Due to PID Type Mismatch

The assert_process_in_not_moved_into_ray_cgroups function expects a string PID but receives an integer from second_driver_proc.pid. This type mismatch causes the internal PID comparison to always fail, making the test assertion ineffective and always pass.

Fix in Cursor Fix in Web

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. nits

)


def assert_process_in_not_moved_into_ray_cgroups(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you forget a word in the name?

node_id: used to construct the path of the cgroup subtree
resource_isolation_config: used to construct the path of the cgroup
subtree
pid:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing

"""
import ray
import time
ray.init(namespace='_ray_internal_')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a constant for it somewhere you could use, but hard coding is also fine (it shouldn't change)

@edoakes edoakes merged commit 00b1b53 into master Oct 22, 2025
6 checks passed
@edoakes edoakes deleted the irabbani/cgroups-21 branch October 22, 2025 19:18
israbbani added a commit that referenced this pull request Oct 22, 2025
…cgroup even if they are drivers (#57955)

For more details about the resource isolation project see
#54703.

Driver processes that are registered in ray's internal namespace (such
as ray dashboard's job and serve modules) are considered system
processes. Therefore, they will not be moved into the workers cgroup
when they register with the raylet.

---------

Signed-off-by: irabbani <[email protected]>
aslonnie pushed a commit that referenced this pull request Oct 22, 2025
Cherrypicking #57955 into v2.51.0

Signed-off-by: irabbani <[email protected]>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
…cgroup even if they are drivers (ray-project#57955)

For more details about the resource isolation project see
ray-project#54703.

Driver processes that are registered in ray's internal namespace (such
as ray dashboard's job and serve modules) are considered system
processes. Therefore, they will not be moved into the workers cgroup
when they register with the raylet.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: xgui <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…cgroup even if they are drivers (ray-project#57955)

For more details about the resource isolation project see
ray-project#54703.

Driver processes that are registered in ray's internal namespace (such
as ray dashboard's job and serve modules) are considered system
processes. Therefore, they will not be moved into the workers cgroup
when they register with the raylet.

---------

Signed-off-by: irabbani <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…cgroup even if they are drivers (ray-project#57955)

For more details about the resource isolation project see
ray-project#54703.

Driver processes that are registered in ray's internal namespace (such
as ray dashboard's job and serve modules) are considered system
processes. Therefore, they will not be moved into the workers cgroup
when they register with the raylet.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…cgroup even if they are drivers (ray-project#57955)

For more details about the resource isolation project see
ray-project#54703.

Driver processes that are registered in ray's internal namespace (such
as ray dashboard's job and serve modules) are considered system
processes. Therefore, they will not be moved into the workers cgroup
when they register with the raylet.

---------

Signed-off-by: irabbani <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants