-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] (cgroups 1/n) Adding a sys/fs filesystem driver to perform cgroup operations. #54898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
to perform cgroup operations. Signed-off-by: irabbani <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @israbbani, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request lays the foundational groundwork for Ray to leverage Linux cgroupv2 for resource management. It introduces a clear interface for cgroup operations and provides a concrete implementation that interacts directly with the sys/fs cgroupv2 pseudo-filesystem, enabling future features for fine-grained control over process resources.
Highlights
- Cgroup Driver Interface: Introduced
CgroupDriverInterface, a new C++ abstract class defining a set of common operations for managing cgroups, including checking cgroupv2 status, creating/validating cgroups, moving processes, enabling/disabling controllers, and applying resource constraints. - SysFS Cgroupv2 Driver Implementation: Added
SysFsCgroupDriver, an implementation ofCgroupDriverInterfacethat interacts with the Linuxsys/fspseudo-filesystem to perform cgroupv2 operations. This driver handles file system interactions, permissions, and error handling specific to cgroupv2. - Cgroup Testing Infrastructure: Introduced comprehensive testing utilities (
TempCgroupDirectory,TempFile,StartChildProcessInCgroup) and dedicated integration and unit tests to validate theSysFsCgroupDriver's functionality and ensure correct interaction with the cgroupv2 filesystem. - New Status Code for Permissions: Added
StatusCode::PermissionDeniedto theray::Statussystem, along with helper methods, to provide more granular error reporting for operations that fail due to insufficient user permissions.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new cgroup driver for interacting with cgroup v2 via the sysfs interface. The changes are extensive, including the driver interface and implementation, testing utilities, and both unit and integration tests. The code is well-structured, and the addition of comprehensive tests is commendable.
I've identified a few critical issues related to compilation errors and unhandled exceptions that must be addressed. I've also included some high and medium severity suggestions to improve correctness, robustness, and code clarity.
Overall, this is a solid foundation for cgroup management within Ray.
src/ray/common/cgroup2/test/sysfs_cgroup_driver_integration_test.cc
Outdated
Show resolved
Hide resolved
Signed-off-by: irabbani <[email protected]>
instead of clone for older kernel headers < 5.7 (which is what we have in CI) Signed-off-by: irabbani <[email protected]>
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review the tests yet
| Checks that the cgroup is valid. See return values for details of which | ||
| invariants are checked. | ||
|
|
||
| @param cgroup the absolute path of the cgroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming this is some java docstring style?
we should keep it standard, either:
- follow what we currently have
- let's commit to migrating to this style if there's a strong benefit to it
(FWIW I find the style you have more readable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR description to talk about this. The style is called doxygen. It can be used to automatically generate API docs for developers. I think it has a few advantages
- I find it more readable what we currently have.
- The automated API docs will be useful for public contributions and internal contributors to search through APIs. E.g. it can help answer questions like do we have a utility class that joins paths?
- There are special tags that the docs generator can use e.g. @param, @tparam (for template params), @ref, @see etc. The whole list is here, but we can start with a few of these.
I'd prefer to change our current ones to these and use this as a standard moving forward.
|
@israbbani re: the test structure, it might be better to separate integration tests into their own subdirectory. That way,
|
Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
I'm open to all 3. I think I prefer because it's ambiguous.
|
|
|
||
| /** | ||
| A utility interface that allows the caller to check if cgroupv2 is mounted correctly | ||
| and perform cgroup operations on the system. It currently only supports using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a slight nit/question, we're saying this class currently only supports using the memory and cpu controllers, do we intend to add future support in this class in the future? If so, what needs to be added? How? If we don't intend this class to have an expansion of scope, let's remove this language and just say it 'supports x'.
This feedback though is in 'hot take' territory, I can't quote a style guideline, so take it with a grain of salt. It's mostly just my own preference. When I encounter classes that have comments like "we will add x" or "this right now does y" and then I see the last commit to the file was some years ago, my takeaway is that this is perhaps a dead effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! Not too hot at all :)
Unless there are specific plans to add more controllers, I would concisely state what is supported support and leave it at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I've reworded it. PTAL
| If cgroupv2 is not enabled, or enabled along with cgroupv1, returns Invalid | ||
| with the appropriate error message. | ||
|
|
||
| @see systemd's documentation for more information about unified mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit(?): make sure doxygen understands how to deal with this (as the source url is on a new line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
| ray::StatusOr<std::unique_ptr<TempDirectory>> TempDirectory::Create() { | ||
| std::string path = "/tmp/XXXXXX"; | ||
| char *ret = mkdtemp(path.data()); | ||
| if (ret == nullptr) { | ||
| return ray::Status::UnknownError( | ||
| absl::StrFormat("Failed to create a temp directory. " | ||
| "Cgroup tests expect tmpfs to be mounted and only run on Linux.\n" | ||
| "Error: %s", | ||
| strerror(errno))); | ||
| } | ||
| std::unique_ptr<TempDirectory> temp_dir = | ||
| std::make_unique<TempDirectory>(std::move(path)); | ||
| return ray::StatusOr<std::unique_ptr<TempDirectory>>(std::move(temp_dir)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO for tests we can just RAY_CHECK and abort if we fail setup preconditions. this is fine too, just more work
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all nits, 🚢
| // Note: the process needs execute permissions for the cgroup directory | ||
| // to traverse the filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd!
Signed-off-by: irabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]>
Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]>
Signed-off-by: irabbani <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: avigyabb <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: avigyabb <[email protected]>
…oup operations. (#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See #54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
…oup operations. (#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See #54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: minerharry <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: minerharry <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Michael Acar <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: jugalshah291 <[email protected]>
…oup operations. (ray-project#54898) Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See ray-project#54703 for more details. If you're interested in learning more about cgroups, here are a few resources: - https://docs.kernel.org/admin-guide/cgroup-v2.html (the docs) - https://www.youtube.com/watch?v=kcnFQgg9ToY (a video) The following classes are introduced: - CgroupDriver (interface that should be agnostic of the filesystem) - SysFsCgroupDriver (implements CgroupDriver by interacting with the /sys/fs/cgroup filesystem) Ray will currently only the following cgroup controllers with their respective constraints: - cpu (with cpu.weight) - memory (with memory.min) I've tried to introduce a few patterns and best practices that are worth pointing out: ### Class Design - Returning Status/StatusOr<T> from all public methods that can fail to facilitate proper error handling in the caller. - Separating the interface from the implementation (see CgroupDriverInterface.h) to facilitate dependency injection for tests. This also future-proofs the class for if/when we implement a Systemd based cgroup driver which is what K8S does. - Separating the interface also has the added benefit of improving compile times because the interface header files are lightweight. ### Documentation - I prefer to use the doxygen format for class and function documentation. These are comments that can be recognized by doxygen tooling and be used to produce API reference docs. - Writing documentation for the appropriate level thinking about the user/consumer of the API - CgroupDriver is an abstract class, but it's what the consumer of the API will import and use. It's documentation is API focused and outlines failure modes and return types. This is akin to man pages (see man 2 open). - SysFsCgroupDriver is a class that is aware of the filesystem. This is going to talk about which files on sys/fs/cgroup are used and how it interacts with them. The important thing here is to document the specific gotchas of those files. The documentation reflects this change in level of abstraction. - Most importantly, comments are written for the future reader and not for the writer. They need to be written with the goal of being understood. ### Testing - Unit tests use no mocking and are written strictly against the public API of the class under test. - Unit tests use the naming convention UnitOfWorkConditionUnderTestExpectedOutput. --------- Signed-off-by: irabbani <[email protected]> Signed-off-by: Ibrahim Rabbani <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>

Ray is going to use cgroups to implement resource isolation between system processes (raylet, dashboard_agent, gcs) and application processes (workers). See #54703 for more details.
If you're interested in learning more about cgroups, here are a few resources:
The following classes are introduced:
Ray will currently only the following cgroup controllers with their respective constraints:
I've tried to introduce a few patterns and best practices that are worth pointing out:
Class Design
Documentation
Testing