-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
racy RLIMIT_NOFILE setting with Go 1.19+ #4195
Comments
Sorry, I can't reproduce it with your provided step.
|
I am so sorry I left out an important detail. This may not happen right
away. It's random and can take some time to see the flip. I usually
see at least one flip within 15 minutes of letting this run.
…On Wed, Feb 7, 2024 at 6:10 PM lfbzhm ***@***.***> wrote:
Sorry, I can't reproduce it with your provided step.
date; while true; do sudo /workspaces/runc/runc exec -t test /bin/sh -c
'ulimit -a' 2>&1 | grep nofiles; sleep 0.1; done | ts '%F %T'
I printed all logs out, it looks like:
2024-02-08 02:07:44 nofiles 65536
2024-02-08 02:07:44 nofiles 65536
2024-02-08 02:07:45 nofiles 65536
—
Reply to this email directly, view it on GitHub
<#4195 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFPCEZMOPWMVTZTTQPY3XNDYSQXYJAVCNFSM6AAAAABC6XHBU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZTGI2TINBSGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@lifubang was looking to see if there was any progress made on this issue. We are looking to get an update for this because of vulnerabilities which are being flagged for go1.20.3 |
Sorry, I just returned from vacation and haven't reproduced your question yet. If you could take the time to point out the problem, it would be greatly appreciated. |
I successfully reproduced this problem. I think the root cause is the changes related to rlimit-nofile in the new version of go. I've submitted a PR which should fix this issue. |
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
Reproduced locally, 40 times out of 1 million. |
Since Go 1.21 (https://go-review.googlesource.com/c/go/+/476097), Go runtime saves the original value of RLIMIT_NOFILE upon startup and uses the saved value in StartProcess, unless RLIMIT_NOFILE is not explicitly changed by calling syscall.Setrlimit. Now, runc uses unix.Prlimit (rather than syscall.Setrlimit) to set RLIMIT_NOFILE, thus Golang runtime is not aware that it is changed, result in occasional reset of RLIMIT_NOFILE, reported in opencontainers#4195. Bumping x/sys/unix to v0.7.0 fixes this (via https://go-review.googlesource.com/c/sys/+/476695). Signed-off-by: Kir Kolyshkin <[email protected]>
Since Go 1.21 (https://go-review.googlesource.com/c/go/+/476097), Go runtime saves the original value of RLIMIT_NOFILE upon startup and uses the saved value in StartProcess, unless RLIMIT_NOFILE is not explicitly changed by calling syscall.Setrlimit. Now, runc uses unix.Prlimit (rather than syscall.Setrlimit) to set RLIMIT_NOFILE, thus Golang runtime is not aware that it is changed, result in occasional reset of RLIMIT_NOFILE, reported in opencontainers#4195. Bumping x/sys/unix to v0.7.0 fixes this (via https://go-review.googlesource.com/c/sys/+/476695). Signed-off-by: Kir Kolyshkin <[email protected]>
Since Go 1.21 (https://go-review.googlesource.com/c/go/+/476097), Go runtime saves the original value of RLIMIT_NOFILE upon startup and uses the saved value in StartProcess, unless RLIMIT_NOFILE is not explicitly changed by calling syscall.Setrlimit. Now, runc uses unix.Prlimit (rather than syscall.Setrlimit) to set RLIMIT_NOFILE, thus Golang runtime is not aware that it is changed, result in occasional reset of RLIMIT_NOFILE, reported in opencontainers#4195. Bumping x/sys/unix to v0.7.0 fixes this (via https://go-review.googlesource.com/c/sys/+/476695). Signed-off-by: Kir Kolyshkin <[email protected]>
Since Go 1.21 (https://go-review.googlesource.com/c/go/+/476097), Go runtime saves the original value of RLIMIT_NOFILE upon startup and uses the saved value in StartProcess, unless RLIMIT_NOFILE is not explicitly changed by calling syscall.Setrlimit. Now, runc uses unix.Prlimit (rather than syscall.Setrlimit) to set RLIMIT_NOFILE, thus Golang runtime is not aware that it is changed, result in occasional reset of RLIMIT_NOFILE, reported in opencontainers#4195. Bumping x/sys/unix to v0.7.0 fixes this (via https://go-review.googlesource.com/c/sys/+/476695). Signed-off-by: Kir Kolyshkin <[email protected]>
#4237 states this is because of a change in Go 1.21 (which was also backported to Go 1.20.4). Yet I reproduced it with Go 1.19: $ RUNC=~/git/runc/runc-4239-go1.19; i=0; $RUNC --version && while [ $i -lt 100000 ]; do LIM=$(sudo $RUNC exec bionic sh -c "ulimit -n"); if [ $LIM -ne 65536 ]; then echo "WHOOPSIE (iter $i) got numfile $LIM"; break; fi; ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i"; let i++; done
runc version 1.1.12+dev
commit: v1.0.0-754-gec700396
spec: 1.0.2-dev
go: go1.19.13
libseccomp: 2.5.1
[Tue 02 Apr 2024 06:43:29 PM PDT] 0
[Tue 02 Apr 2024 06:43:30 PM PDT] 100
....
[Tue 02 Apr 2024 06:48:35 PM PDT] 22800
[Tue 02 Apr 2024 06:48:36 PM PDT] 22900
WHOOPSIE (iter 22945) got numfile 1024 This issue states only runc v1.1.10+ is affected. I have reproduced it with earlier version:
Will try older versions of runc and go tomorrow. |
Apparently runc version 1.1.0, compiled with go1.17.13, does not have the issue. |
I was able to reproduce this with runc-1.1.0 compiled with Go 1.19.13. So the problem is not in runc per se, it is in Go runtime (as pointed out in #4237). Alas, the fix that I proposed in #4239 is not working for some reason. Testing whether we have the issue in the main branch (I was thinking we do not, but no longer sure). |
Yes we do (with the same script as above):
|
IMHO this is ultimately a Go stdlib bug -- they assume that only the Go process itself will change its own rlimits (and thus use a cached version of the rlimit), but As a workaround, we could try to set our own limits -- but unlike the solution proposed in #4239, we should use the configured limit, not the one it happens to be at the time. But IMHO this is clearly a Go bug... @kolyshkin This is why updating |
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
Issue: opencontainers#4195 Since https://go-review.googlesource.com/c/go/+/476097, there is a get/set race between runc exec and syscall.rlimit.init, so we need to call setupRlimits after syscall.rlimit.init() completed. Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. Before executing exec, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. This can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
Issue: opencontainers#4195 Since https://go-review.googlesource.com/c/go/+/476097, there is a get/set race between runc exec and syscall.rlimit.init, so we need to call setupRlimits after syscall.rlimit.init() completed. Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
Issue: opencontainers#4195 Since https://go-review.googlesource.com/c/go/+/476097, there is a get/set race between runc exec and syscall.rlimit.init, so we need to call setupRlimits after syscall.rlimit.init() completed. Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]>
Issue: opencontainers#4195 Since https://go-review.googlesource.com/c/go/+/476097, there is a get/set race between runc exec and syscall.rlimit.init, so we need to call setupRlimits after syscall.rlimit.init() completed. Signed-off-by: lifubang <[email protected]> (cherry picked from commit a853a82) Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
Issue: opencontainers#4195 Since https://go-review.googlesource.com/c/go/+/476097, there is a get/set race between runc exec and syscall.rlimit.init, so we need to call setupRlimits after syscall.rlimit.init() completed. Signed-off-by: lifubang <[email protected]> (cherry picked from commit a853a82) Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime will cache rlimit-nofile. Before executing execve, the rlimit-nofile of the process will be restored with the cache. In runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. It can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]> (cherry picked from commit f9f8abf) Signed-off-by: lifubang <[email protected]> (cherry picked from commit da68c8e) Signed-off-by: lifubang <[email protected]>
issues: opencontainers#4195 opencontainers#4265 (comment) Signed-off-by: lifubang <[email protected]> (cherry picked from commit 4ea0bf8) Signed-off-by: lfbzhm <[email protected]>
Description
We noticed an issue with 'ubuntu:bionic' where runc does not continually honor "RLIMIT_NOFILE " setting defined in the spec. After configuring the setting to '65536 ' Its keeps flipping between '65536' & '1024' intermittently. Does not appear to be consistent. Looking back, this issue appears to have started happening in runc-1.1.10 and above. Hoping that someone can provide a fix or some insight to why this is happening. The reproduction is using the latest runc-1.1.12
Steps to reproduce the issue
Running on an 'ubuntu:focal (20.04.6 LTS )' server with kernel 5.4.0-164-generic, using bash shell
Describe the results you received and expected
In the monitoring window you initially just see a date, then intermittently you will see lines appear: nofiles 1024
e.g.
date; while true; do sudo $(pwd)/runc-1.1.12 exec -t bionic /bin/sh -c 'ulimit -a' 2>&1 | grep nofiles; sleep 0.1; done | ts '%F %T' | egrep 1024
Wed 07 Feb 2024 07:34:12 PM UTC
2024-02-07 19:35:18 nofiles 1024
2024-02-07 19:35:44 nofiles 1024
2024-02-07 19:36:08 nofiles 1024
2024-02-07 19:37:08 nofiles 1024
I would have expected 'nofiles' to never flip back to 1024 and remain at '65536' so we should never see any output.
What version of runc are you using?
./runc-1.1.12 --version
runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4
Host OS information
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Host kernel information
Linux ip-10-13-108-144.pwx.purestorage.com 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: