-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] (cgroups 9/n) end-to-end integration of cgroups with ray start. #56352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
to perform cgroup operations. Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
instead of clone for older kernel headers < 5.7 (which is what we have in CI) Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
fix CI. Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: Ibrahim Rabbani <[email protected]>
|
@israbbani merged to avoid excessive iteration time. Please follow up to use existing fixtures if possible and let's do something different than mark tests as |
|
I can merge/promote fixtures to the main conftest and use them. I don't have a good idea for how to prevent these tests from running as part of normal unit tests without marking them as manual. I'll leave a TODO in #54703 and see if I can come up with anything better. |
| instance_type: medium | ||
| commands: | ||
| - RAYCI_DISABLE_TEST_DB=1 bazel run //ci/ray_ci:test_in_docker -- //:all //src/ray/common/cgroup2/tests/... core --build-type clang --cache-test-results | ||
| - bazel run //ci/ray_ci:test_in_docker -- //:all //python/ray/tests/resource_isolation:test_resource_isolation_integration //python/ray/tests/resource_isolation:test_resource_isolation_config core --privileged --cache-test-results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we remove the RAYCI_DISABLE_TEST_DB=1 accidentally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah that should get added back
…r to move processes into system cgroup (#56446) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Zhiqiang Ma <[email protected]>
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Zhiqiang Ma <[email protected]>
…n startup (#56522) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in #56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: zac <[email protected]>
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: zac <[email protected]>
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: zac <[email protected]>
#56352) This PR stacks on #56297. For more details about the resource isolation project see #54703. This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…r to move processes into system cgroup (#56446) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…n startup (#56522) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in #56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Marco Stephan <[email protected]>
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Marco Stephan <[email protected]>
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Marco Stephan <[email protected]>
#56352) This PR stacks on #56297. For more details about the resource isolation project see #54703. This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…r to move processes into system cgroup (#56446) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…n startup (#56522) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in #56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: elliot-barn <[email protected]>
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
This PR stacks on #56297.
For more details about the resource isolation project see #54703.
This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include:
--except-tags cgroupeverywhere else.where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
python/ray/tests/resource_isolation/test_resource_isolation_integration.py.6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented.