-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handle hitting the memory limit #2812
Conversation
fb6b165
to
eecb4fe
Compare
7e961df
to
560b22e
Compare
Before:
After:
This at least gives some info about what's wrong as well as a hint about the lowest possible limit. This also sheds some light about what is going on in #2736 -- apparently runc init has some peak memory use upon the start, this is why retrying on EBUSY helps @wzshiming. Yet I am not convinced this should be handled by a retry. |
a416213
to
7a3b640
Compare
Failure on centos 7, seems like a random one (some rare race in old systemd?):
|
@wzshiming can you test this PR on your setup to reproduce #2736 and copy-paste the runc errors here? It will help to understand what is going on in your case. |
This comment has been minimized.
This comment has been minimized.
No longer a wip. |
7a3b640
to
4cf7a53
Compare
4cf7a53
to
80b81cd
Compare
Moved to draft to add OOM handling in runc exec. Added, tested, ready for the prime time! |
Add integration test coverage for code initially added by commit d8b8f76 ("Fix problem when update memory and swap memory", 2016-04-05). This is in addition to existing unit test, TestMemorySetSwapSmallerThanMemory. Signed-off-by: Kir Kolyshkin <[email protected]>
Currently, we read and parse 5 different files while we only need 1. Use GetCgroupParamUint() directly to get current limit. While at it, remove the workaround previously needed for the unit test, and make it a bit more verbose. Signed-off-by: Kir Kolyshkin <[email protected]>
1. Factor out setMemory and setSwap 2. Pass cgroup.Resources (rather than cgroup) to setMemoryAndSwap(). 3. Merge the duplicated "set memory, set swap" case. Signed-off-by: Kir Kolyshkin <[email protected]>
EBUSY when trying to set memory limit may mean the new limit is too low (lower than the current usage, and the kernel can't do anything). Provide a more specific error for such case. Signed-off-by: Kir Kolyshkin <[email protected]>
1. This is the only function in the package with Get prefix that does not read a file (but parses a string). Rename accordingly, and convert the callers. GetCgroupParamKeyValue -> ParseKeyValue 2. Use strings.Split rather than strings.Fields. Split by a space is 2x faster, plus we can limit the splitting. The downside is we have to strip a newline in one of the callers. 3. Improve the doc and the code flow. 4. Fix a test case with invalid data (spaces at BOL). Signed-off-by: Kir Kolyshkin <[email protected]>
This is to benefit from openat2() implementation, on kernels that support it. Theoretically this also improves security. Signed-off-by: Kir Kolyshkin <[email protected]>
Generalize the libct/getValueFromCgroup() as fscommon.GetValueByKey(), and document it. No changes other than using fscommon.ParseUint to convert the value. Signed-off-by: Kir Kolyshkin <[email protected]>
This makes the code simpler and more future-proof, in case any more values will appear in hugetlb.*.events. Signed-off-by: Kir Kolyshkin <[email protected]>
In some cases, container init fails to start because it is killed by the kernel OOM killer. The errors returned by runc in such cases are semi-random and rather cryptic. Below are a few examples. On cgroup v1 + systemd cgroup driver: > process_linux.go:348: copying bootstrap data to pipe caused: write init-p: broken pipe > process_linux.go:352: getting the final child's pid from pipe caused: EOF On cgroup v2: > process_linux.go:495: container init caused: read init-p: connection reset by peer > process_linux.go:484: writing syncT 'resume' caused: write init-p: broken pipe This commits adds the OOM method to cgroup managers, which tells whether the container was OOM-killed. In case that has happened, the original error is discarded (unless --debug is set), and the new OOM error is reported instead: > ERRO[0000] container_linux.go:367: starting container process caused: container init was OOM-killed (memory limit too low?) Also, fix the rootless test cases that are failing because they expect an error in the first line, and we have an additional warning now: > unable to get oom kill count" error="no directory specified for memory.oom_control Signed-off-by: Kir Kolyshkin <[email protected]>
An exec may fail due to memory shortage (cgroup memory limits being too tight), and an error message provided in this case is clueless: > $ sudo ../runc/runc exec xx56 top > ERRO[0000] exec failed: container_linux.go:367: starting container process caused: read init-p: connection reset by peer Same as the previous commit for run/start, check the OOM kill counter and report an OOM kill. The differences from run are 1. The container is already running and OOM kill counter might not be zero. This is why we have to read the counter before exec and after it failed. 2. An unrelated OOM kill event might occur in parallel with our exec (and I see no way to find out which process was killed, except to parse kernel logs which seems excessive and not very reliable). This is why we report _possible_ OOM kill. With this commit, the error message looks like: > ERRO[0000] exec failed: container_linux.go:367: starting container process caused: process_linux.go:105: possibly OOM-killed caused: read init-p: connection reset by peer Signed-off-by: Kir Kolyshkin <[email protected]>
80b81cd
to
38b2dd3
Compare
This improves runc error reporting in cases where memory limit is set too low.
NOTE that while this is clearly an enhancement I want this in rc94 because
1. Improve EBUSY handling on setting cgroup memory limit
On cgroup v1 + fs driver, setting memory.limit_in_bytes results in EBUSY. This is reported as
To decipher the error a user needs to know that EBUSY is returned by the kernel when the limit being set is too low. In addition, it would be handy to know what the current usage is.
Handle EBUSY and report this:
Related to #2736
2. Check for OOM kill on container start
When a container fails to start due to memory limit set too low, and container init being OOM-killed,
error messages returned by runc are semi-random and rather cryptic. Here are a few examples
(truncated to remove the common prefix for clarity):
process_linux.go:348: copying bootstrap data to pipe caused: write init-p: broken pipe
(cgroup v1 + systemd driver)process_linux.go:352: getting the final child's pid from pipe caused: EOF
(cgroup v1 + systemd driver)process_linux.go:495: container init caused: read init-p: connection reset by peer
(cgroup v2)process_linux.go:484: writing syncT 'resume' caused: write init-p: broken pipe
(cgroup v2)On container start error path, add a check if OOM kill has happened, and report it instead of the original (cryptic) error:
(or, if
--debug
is set, also provide the original error):3. Check for OOM kill on container exec
Same as above, with some nuances:
The container is already running and OOM kill counter might not be
zero. This is why we have to read the counter before exec and after
it failed.
An unrelated OOM kill event might occur in parallel with our exec
(and I see no way to find out which process was killed, except to
parse kernel logs which seems excessive and not very reliable).
This is why we report possible OOM kill.
The error message changed from
to
4. Some minor improvements and refactoring along the way.
Please see individual commits for details.
TODO
look into why systemd v1 driver does not fail to set the limits like fs driver doessee Fix discrepancy between systemd and fs cgroup managers #2813 / libct/cgroups/systemd: don't set limits in Apply #2814.