Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix set nofile rlimit error #4265

Merged

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Apr 30, 2024

Fix: #4195
Close: #4237

1. Fix a get/set race between runc exec and syscall.rlimit.init()

As @ls-ggg has given the detailed steps to reproduce the issue #4195 , and has given the core reason is that the go runtime will cache the value of rlimit nofile. [1]
When we are running runc exec with nofile limit, runc set it in the parent process, it may cause a race with go runtime init. The race condition is in the time when runc parent process set the container child process's nofile limit just after go runtime init has fetched the nofile limit. [2]
So, we should set nofile limit after syscall.rlimit.init() completed.

2. Fix an edge case caused by nofile rlimit cache in go stdlib

As @kolyshkin have found an edge case for set nofile rlimit by runc, which is also caused by nofile rlimit cache in go stdlib. Although we have a way to resolve the above get/set race, but if the hard value of nofile rlimit configured in config.json is bigger than this value of runc create/run/exec, it will also be incorrect restored by syscall.Exec in runc init. So we need to clear the nofile rlimit cache before we start the container initial process if we need to set nofile rlimit for the container.

[1] golang/go@f5eef58#diff-ec665e9789f8cf5cd1828ad7fa9f0ff4ebc1f5b5dd0fc82a296da5c07da7ece6
[2] https://github.com/golang/go/blob/f5eef58e4381259cbd84b3f2074c79607fb5c821/src/syscall/rlimit.go#L34-L35

@lifubang lifubang requested review from kolyshkin and cyphar April 30, 2024 16:48
@lifubang
Copy link
Member Author

@ls-ggg PTAL, could you please help to test whether this PR fix the issue #4195 or not? Thanks very much.
I has test it for a long time, it seems that the issue has gone with this patch.

This is my test with @kolyshkin 's script in #4195 (comment)

[2024年 05月 01日 星期三 00:50:14 CST]      82500
[2024年 05月 01日 星期三 00:50:18 CST]      82600

@lifubang
Copy link
Member Author

The ci test in centos7 is always fail after re-test:

not ok 91 runc exec with RLIMIT_NOFILE
# (in test file tests/integration/exec.bats, line 489)
#   `[[ "${output}" == "65536" ]]' failed
# runc spec (status=0):
#
# runc run -d --console-socket /tmp/bats-run-pW8vNL/runc.jTfAy8/tty/sock test_busybox (status=0):
#
# runc exec test_busybox /bin/sh -c ulimit -n (status=0):
# 1024

I'll take a look tomorrow.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, I read your comment and wrote the very same code yesterday, still testing it since it needs about 10 minutes to repro the issue.

Comment on lines 480 to 497
@test "runc exec with RLIMIT_NOFILE" {
update_config '.process.capabilities.bounding = ["CAP_SYS_RESOURCE"]'
update_config '.process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 65536, "soft": 65536}]'

runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
[ "$status" -eq 0 ]

# issue: https://github.com/opencontainers/runc/issues/4195
runc exec test_busybox /bin/sh -c "ulimit -n"
[[ "${output}" == "65536" ]]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @kolyshkin said in #4266, there isn't much point making a CI test for this -- because this is a race we would need to run this for many iterations to confirm whether the race still exists. Running it once just wastes CI time because even if the bug was re-introduced we would never catch it in a single CI run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll remove it later, at this time, I used it to test some other issues.

@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from e5fa99b to a996139 Compare May 1, 2024 14:25

update_config '.process.args = ["/bin/sh", "-c", "ulimit -n"]'
runc run test_ulimit
[[ "${output}" == "65536" ]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intrestingly, I don't know how this PR impacts runc run with RLIMIT_NOFILE in centos7:

not ok 91 runc exec with RLIMIT_NOFILE
# (in test file tests/integration/exec.bats, line 489)
#   `[[ "${output}" == "65536" ]]' failed
#
# runc run -d --console-socket /tmp/bats-run-PQBrqc/runc.wbkYSh/tty/sock test_busybox (status=0):
#
# runc run test_ulimit (status=0):
# 1024

As it will not fail in #4267 .

@kolyshkin @cyphar @AkihiroSuda Do you know why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang first, there's no such code (of running a second container with modified .process.args) in #4267.

Second, you forgot to check the exit code (here and below).

Third, I think you don't have to add CAP_SYS_RESOURCE as this is the capability of the container itself, and rlimits are set by runc which already have it.

Finally, it looks like you just found a new bug -- RLIMIT_NOFILE is not being set in CentOS 7 for runc run. Can confirm this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang the issue is, on CentOS 7 the hard limit for NOFILE is 4096. For some reason yet unknown to me, raising the limit is not possible for runc via prlimit(2) syscall, and it returns no error. Maybe it's a bug in CentOS 7 kernel (related to namespaces I guess).

Two ways to workaround it:

  1. Use lower limits (soft <= 1024, hard <= 4096) in the test code.
  2. Add something like this to the test:
function setup() {
+	prlimit --nofile=99999:99999 -p $$
	setup_busybox
}

Surely, I won't recommend item 2, but it works.

Copy link
Member Author

@lifubang lifubang May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first, there's no such code (of running a second container with modified .process.args) in #4267.

Second, you forgot to check the exit code (here and below).

These two has added in #4267, but it still be success.

Yes, it's indeed failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in Ubuntu 20.04 LTS (kernel 5.15) and Fedora 40 (kernel 6.8) -- if the current hard limit is equal or lower than the one we're trying to set using prlimit syscall, it returns no error but the limit is not changed.

To repro this locally:

$ sudo
# ulimit -H -n 65536
# bats tests/integration/ulimit.bats
root@kir-tp1:/home/kir/git/runc# bats tests/integration/ulimit.bats 
ulimit.bats
 ✗ runc run with RLIMIT_NOFILE
   (in test file tests/integration/ulimit.bats, line 19)
     `[[ "${output}" == "65536" ]]' failed
   runc spec (status=0):
   
   runc run test_ulimit (status=0):
   1024
 ✓ runc exec with RLIMIT_NOFILE
 ✗ runc run+exec two containers with RLIMIT_NOFILE
   (in test file tests/integration/ulimit.bats, line 45)
     `[[ "${output}" == "65536" ]]' failed
   runc spec (status=0):
   
   runc run -d --console-socket /tmp/bats-run-TSa3k7/runc.jz4uuS/tty/sock test_busybox (status=0):
   
   runc run test_ulimit (status=0):
   1024

3 tests, 2 failures

Here's the test code (tests/integration/ulimit.bats):

#!/usr/bin/env bats

load helpers

function setup() {
	setup_busybox
}

function teardown() {
	teardown_bundle
}

@test "runc run with RLIMIT_NOFILE" {
	update_config '.process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 65536, "soft": 65536}]'

	update_config '.process.args = ["/bin/sh", "-c", "ulimit -n"]'
	runc run test_ulimit
	[ "$status" -eq 0 ]
	[[ "${output}" == "65536" ]]
}

@test "runc exec with RLIMIT_NOFILE" {
	update_config '	 .process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 2000, "soft": 1000}]'

	runc run -d --console-socket "$CONSOLE_SOCKET" nofile
	[ "$status" -eq 0 ]

	for ((i = 0; i < 100; i++)); do
		runc exec nofile sh -c 'ulimit -n'
		echo "[$i] $output"
		[[ "${output}" == "1000" ]]
	done
}

@test "runc run+exec two containers with RLIMIT_NOFILE" {
	update_config '.process.capabilities.bounding = ["CAP_SYS_RESOURCE"]'
	update_config '.process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 65536, "soft": 65536}]'

	runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
	[ "$status" -eq 0 ]

	update_config '.process.args = ["/bin/sh", "-c", "ulimit -n"]'
	runc run test_ulimit
	[ "$status" -eq 0 ]
	[[ "${output}" == "65536" ]]

	# issue: https://github.com/opencontainers/runc/issues/4195
	runc exec test_busybox /bin/sh -c "ulimit -n"
	[ "$status" -eq 0 ]
	[[ "${output}" == "65536" ]]
}

If you set hard ulimit -n to 65537, it works again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in Ubuntu 20.04 LTS (kernel 5.15) and Fedora 40 (kernel 6.8) -- if the current hard limit is equal or lower than the one we're trying to set using prlimit syscall, it returns no error but the limit is not changed.

Thanks your corner case.

This is an issue which has no relate to the kernel version. This is another bug introduced by https://go-review.googlesource.com/c/go/+/476097 , but not a get/set race.
For example:
Consider Nofile limit: [cur, max], if syscall.rlimit.init get the value of [1024, 65536], the value of origRlimitNofile is [1024, 65536], and the nofile limit of runc init process is [65536, 65536], after runc create/exec process set it to [65537, 65537] or [65536, 65536], it will success, and when run init do unix.Exec, it will restore nofile limit to [1024, 65536], it will success.
But if runc create/exec set it to [65535, 65535], it will fail when restoring to [1024, 65536], and the error of this failure has been ignored by 'syscall.Exec`, so the nofile rliimit remains [65535, 65535].

Please see #4267 .

So I think the solution of #4237 is correct, but need some refactors.

Copy link
Contributor

@kolyshkin kolyshkin May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another addition: we do have a test case for setting RLIMIT_NOFILE during runc exec, but the test case is masking the issue because it sets Cur and Max to the same value (and so Go runtime does not do any tricks with saving/restoring NOFILE).

With this change, the test fails:

diff --git a/libcontainer/integration/exec_test.go b/libcontainer/integration/exec_test.go
index 9dd056ad..f4b8b839 100644
--- a/libcontainer/integration/exec_test.go
+++ b/libcontainer/integration/exec_test.go
@@ -140,7 +140,7 @@ func testRlimit(t *testing.T, userns bool) {
        // the Setrlimit call happens early enough that we still have permissions to raise the limit.
        ok(t, unix.Setrlimit(unix.RLIMIT_NOFILE, &unix.Rlimit{
                Max: 1024,
-               Cur: 1024,
+               Cur: 512,
        }))
 
        out := runContainerOk(t, config, "/bin/sh", "-c", "ulimit -n")

With the above change, the test now fails like this:

[kir@kir-tp1 runc]$ go1.20.14 test -exec sudo  -v ./libcontainer/integration -run Rlimit
=== RUN   TestRlimit
    exec_test.go:148: expected rlimit to be 1025, got 512
--- FAIL: TestRlimit (0.11s)
=== RUN   TestUsernsRlimit
    exec_test.go:148: expected rlimit to be 1025, got 512
--- FAIL: TestUsernsRlimit (0.10s)
=== RUN   TestExecInUsernsRlimit
    execin_test.go:129: expected rlimit to be 1026, got 512
--- FAIL: TestExecInUsernsRlimit (0.14s)
=== RUN   TestExecInRlimit
    execin_test.go:129: expected rlimit to be 1026, got 512
--- FAIL: TestExecInRlimit (0.15s)
FAIL
FAIL	github.com/opencontainers/runc/libcontainer/integration	0.542s
FAIL

@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch 2 times, most recently from d59fb11 to f39d3ab Compare May 3, 2024 01:47
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from f39d3ab to db890b6 Compare May 3, 2024 03:01
lifubang added a commit to lifubang/runc that referenced this pull request May 3, 2024
lifubang added a commit to lifubang/runc that referenced this pull request May 4, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from db890b6 to 56399ff Compare May 4, 2024 00:12
@lifubang lifubang changed the title Fix set rlimit nofile race Fix set nofile rlimit error May 4, 2024
lifubang added a commit to lifubang/runc that referenced this pull request May 4, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from 56399ff to 775b1e5 Compare May 4, 2024 00:45
lifubang added a commit to lifubang/runc that referenced this pull request May 4, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from 775b1e5 to 291c4af Compare May 4, 2024 00:47
@lifubang lifubang requested review from kolyshkin and cyphar May 4, 2024 01:27
lifubang added a commit to lifubang/runc that referenced this pull request May 5, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from 291c4af to 2cd053a Compare May 5, 2024 14:23
lifubang added a commit to lifubang/runc that referenced this pull request May 5, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from 2cd053a to b7efb3b Compare May 5, 2024 14:28
lifubang added a commit to lifubang/runc that referenced this pull request May 5, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from b7efb3b to be7e416 Compare May 5, 2024 15:11
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from ee4686b to ce39ae6 Compare May 7, 2024 10:41
lifubang added a commit to lifubang/runc that referenced this pull request May 7, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from ce39ae6 to 6c95ade Compare May 7, 2024 14:40
lifubang and others added 2 commits May 8, 2024 10:40
Issue: opencontainers#4195
Since https://go-review.googlesource.com/c/go/+/476097, there is
a get/set race between runc exec and syscall.rlimit.init, so we
need to call setupRlimits after syscall.rlimit.init() completed.

Signed-off-by: lifubang <[email protected]>
As reported in issue opencontainers#4195, the new version(since 1.19) of go runtime
will cache rlimit-nofile. Before executing execve, the rlimit-nofile
of the process will be restored with the cache. In runc, this will
cause the rlimit-nofile set by the parent process for the container
to become invalid. It can be solved by clearing the cache.

Signed-off-by: ls-ggg <[email protected]>
(cherry picked from commit f9f8abf)
Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this pull request May 8, 2024
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from 6c95ade to a83367c Compare May 8, 2024 10:55
@lifubang lifubang force-pushed the fix-set-RLIMIT_NOFILE-race branch from a83367c to 4ea0bf8 Compare May 8, 2024 10:57
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AkihiroSuda AkihiroSuda merged commit e8bec1b into opencontainers:main May 9, 2024
40 checks passed
lifubang added a commit to lifubang/runc that referenced this pull request May 9, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
@kolyshkin kolyshkin added backport/1.1-done A PR in main branch which has been backported to release-1.1 and removed backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 labels May 9, 2024
lifubang added a commit to lifubang/runc that referenced this pull request May 10, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this pull request May 10, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this pull request May 10, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this pull request May 10, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this pull request May 16, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this pull request May 17, 2024
issues:
opencontainers#4195
opencontainers#4265 (comment)

Signed-off-by: lifubang <[email protected]>
(cherry picked from commit 4ea0bf8)
Signed-off-by: lfbzhm <[email protected]>
@lifubang lifubang mentioned this pull request Jun 10, 2024
@lifubang lifubang deleted the fix-set-RLIMIT_NOFILE-race branch October 15, 2024 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-done A PR in main branch which has been backported to release-1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

racy RLIMIT_NOFILE setting with Go 1.19+
5 participants