-
Notifications
You must be signed in to change notification settings - Fork 7k
[ci] Enable Cgroup support in CI for core #51454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 33 commits
849370a
7b14643
94e25e5
d82dc73
51adfd2
3ad76ea
0fe192f
59bbda2
7d92f16
da908f9
216d578
5e810e6
bbc7a14
58123b3
9df5448
1ae9cd6
801ae0c
1ba8495
a386ce4
37dd164
86cfd09
3718e8a
f3c5df3
49fc60b
5b75e72
15e4d89
db50587
7f5074e
625b27c
a0bf010
a326ebc
c7e32af
d8c3e4d
0d18f89
a57748e
e10ce69
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ | |
|
|
||
| from typing import List, Tuple, Optional | ||
|
|
||
|
|
||
| _CUDA_COPYRIGHT = """ | ||
| ========== | ||
| == CUDA == | ||
|
|
@@ -57,6 +58,7 @@ def __init__( | |
| self.volumes = volumes or [] | ||
| self.envs = envs or [] | ||
| self.envs += _DOCKER_ENV | ||
| self.privileged = False | ||
|
|
||
| def run_script_with_output(self, script: List[str]) -> str: | ||
| """ | ||
|
|
@@ -109,10 +111,10 @@ def get_run_command( | |
| :param gpu_ids: ids of gpus on the host machine | ||
| """ | ||
| artifact_mount_host, artifact_mount_container = self.get_artifact_mount() | ||
| command = [ | ||
| "docker", | ||
| "run", | ||
| "-i", | ||
| command = ["docker", "run", "-i"] | ||
| if self.privileged: | ||
| command.append("--privileged") | ||
|
||
| command += [ | ||
| "--rm", | ||
| "--volume", | ||
| f"{artifact_mount_host}:{artifact_mount_container}", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| import os | ||
| import pytest | ||
| import sys | ||
|
|
||
| from pathlib import Path | ||
|
|
||
| # In privileged containers, we expect the following | ||
| # cgroupv1 is disabled | ||
| # cgroupv2 is enabled and mounted on /sys/fs/cgroup | ||
| # the user running tests has read and write access to the cgroup subtree | ||
| # memory and cpu controllers are enabled | ||
|
|
||
| _MOUNT_FILE_PATH = "/proc/mounts" | ||
| _CGROUP2_PATH = "/sys/fs/cgroup" | ||
| _CTRL_FILE = "cgroup.controllers" | ||
| _EXPECTED_CTRLS = ["memory", "cpu"] | ||
|
|
||
|
|
||
| # mount file format: | ||
| # cgroup /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0 | ||
| def test_only_cgroupv2_mounted_rw(): | ||
| found_cgroupv2 = False | ||
| found_cgroupv1 = False | ||
| with open(Path(_MOUNT_FILE_PATH)) as f: | ||
| for line in f: | ||
| c = line.split() | ||
| found_cgroupv2 = found_cgroupv2 or ( | ||
| c[2] == "cgroup2" and c[1] == _CGROUP2_PATH and "rw" in c[3] | ||
dentiny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ) | ||
| found_cgroupv1 = found_cgroupv1 or (c[2] == "cgroup") | ||
| assert found_cgroupv2 and not found_cgroupv1 | ||
|
|
||
|
|
||
| def test_cgroupv2_rw_for_test_user(): | ||
| assert os.access(_CGROUP2_PATH, os.R_OK) and os.access(_CGROUP2_PATH, os.W_OK) | ||
|
|
||
|
|
||
| def test_cgroupv2_controllers_enabled(): | ||
dentiny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| with open(os.path.join(_CGROUP2_PATH, _CTRL_FILE)) as f: | ||
| enabled = f.readlines() | ||
| assert len(enabled) == 1 | ||
| enabled_ctrls = enabled[0].split() | ||
| for expected_ctrl in _EXPECTED_CTRLS: | ||
| assert ( | ||
| expected_ctrl in enabled_ctrls | ||
| ), f"Expected {expected_ctrl} to be enabled for cgroups2, but it is not" | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sys.exit(pytest.main(["-v", __file__])) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious what is
--config=llvm?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it means inherit
build:llvmThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defined in
.bazelrcfile. I'm not sure if this is the right one for what we want. @aslonnie maybe--config=clangis better because it's consistent with our other cpp tests?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename can happen in another follow up.
@israbbani maybe checkin the change of this
.bazelrcfile first (and I can approve it), and then your iteration on this PR can have much better cache hit rateThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for some reason the cache hit rate has improved on its own. Maybe one of the microchecks passed and published the docker image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, my real question is why do we need the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aslonnie asked me to include it as the base config for C++ testing. It adds the following properties:
My understanding is that it's needed if we're going to use the
cgroupconfig to run C++ tests. Are you asking why we need thecgroupconfig?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm just curious why do we need these llvm config.
Because I don't see it in all C++ related configs, for example
ray/.bazelrc
Lines 128 to 134 in 5e05c2f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. We may need to audit these and removes one that we don't need.